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Sir: 



This Reply Brief is filed in response to the "Examiner's Answer" mailed August 



A Supplemental Amendment was filed October 29, 2001 to cancel subject matter 
from claims 2, 9, 18, and 19. This amendment was filed because Appellants have noted 
that some of the subject matter in these claims reads on subject matter disclosed in PCT 
publication W09929849A, cited in the Supplemental Form PTO-1449 mailed March 23, 
2000, and the corresponding U.S. patent application, which has now issued as U.S. Patent 
No. 6,063,596. Accordingly, Appellants have amended the claims to reduce the issues on 
appeal. 



The Examiner has raised several new arguments in the Examiner's Answer and 
has renewed grounds for the rejection that were previously overcome by the Appellants. 
In the final Office Action, the Examiner states, "Applicants have not clearly 
demonstrated that the cloned nucleic acid and its encoded polypeptide is actually a GPCR 
as was noted in the utility rejection," but "Applicants do indeed provide multiple well 
established and specific utilities for a GPCR." See, Office Action dated February 12, 
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2001, page 3. In the Examiner's Answer, the Examiner agreed with Appellants 1 
statement of the issues in the Appeal Brief that the single issue was whether 2871 is a 
GPCR (Examiner's Answer, page 2). However, the Examiner proceeds to set forth a new 
ground for the utility rejection, indicating that even if Applicants establish 2871 as a 
GPCR, members of the GPCR family of polypeptides do not have well-established 
utility. See, Examiner's Answer, pages 4-5. 

The Examiner also cites a new reference in the Examiner's Answer in support of 
the argument that a protein's sequence may not be used to predict its function (Attwood 
(2000) Science 290:417-473). The Examiner's Answer further includes citations to two 
references that were cited in the first Office Action as a grounds for rejection under 35 
U.S.C. § 101, but that were not cited in the final Office Action. The fact that these 
references were not cited in the final Office Action led Appellants to believe that the 
rejection had been overcome to the extent that it was based on these references, and 
therefore the Examiner's arguments were not addressed in detail in the Appeal Brief. 

Appellants note that this change in direction by the Examiner, which was not 
explained, is a practice which makes patent prosecution more difficult. This practice 
serves to obscure the basis for the rejection and runs the risk of unfairly prejudicing 
applicants' nascent property rights in their patentable subject matter. As stated by the 
Federal Circuit in In re Oetiker, "[t]he examiner cannot sit mum, leaving the applicant to 
shoot arrows in the dark hoping to somehow hit a secret objection harbored by the 
examiner." 977 F.2d 1443, 24 USPQ2d 1443, 1447 (Fed. Cir. 1992) (Plager, J., 
concurring). 

Because the Examiner previously admitted that GPCRs have "multiple well- 
established and specific utilities," Appellants did not fully address the utility of GPCRs in 
the Appeal Brief. It is requested that the rejection be withdrawn or prosecution be 
reopened to give Appellants a fair opportunity to respond to the new and renewed 
grounds of rejection. However, should the rejection not be withdrawn and prosecution 
not be reopened, Applicants here present these arguments in response to the Examiner's 
new and revived grounds of rejection. Responses to the Examiner's new and revived 
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arguments are addressed below in section I, while the issue of utility of the present 
invention is discussed below in section II. 

ARGUMENTS 

I. 2871 Encodes a G-Protein Coupled Receptor 

A. The evidence presented by the Examiner does not address the 
methods used by the Appellants to determine 2871 receptor function. 

1 . Berendsen is directed to the de novo prediction of protein tertiary 
structure from primary structure, not the prediction of protein function based on the 
presence of conserved functional domains. 

In the first office action, the Examiner cited Berendsen (1998) Science 282:642- 
643 in support of the argument that protein activity predictions based on functional 
domains are unpredictable. This reference was not cited in the final Office Action, 
leading the Appellants to believe that the rejection was overcome to the extent that is was 
based on this reference, however the Examiner has cited the reference in the Examiner's 
Answer and thus the arguments relating to this reference will be addressed here. The 
teachings of Berendsen et al. are directed to methods of predicting a protein's tertiary 
structure from its primary sequence. Berendsen states, "[t]he prediction of the native 
conformation of a protein of known amino acid sequence is one of the great open 
questions in molecular biology and one of the most demanding challenges in the new 
field of bioinformatics," (Berendsen, id. at 642) and then proceeds to discuss computer 
simulations of protein folding. 

In the Examiner's Answer, the Examiner appears to acknowledge that Berendsen 
is not directed to the functional domain based predictions of protein function utilized by 
Appellants, but notes that "the activity of any protein or polypeptide is dependent on its 
structure." (August 28, 2001 Office Action, page 8) While Appellants agree that some 
regions of a protein must retain a certain conformation in order for the protein to be 
active, it does not follow that a protein's tertiary structure must be known in order to 
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determine the activity of that protein. In fact, three-dimensional structures have been 
elucidated for only a very few of the thousands of proteins having known biochemical of 
physiological activity. Accordingly, the teachings regarding structural predictions found 
in Berendsen are not relevant to methods for predicting protein function used by the 
Appellants. 

2. Galperin et al. is directed to context-based methods of predicting 
protein function, not to predictions of protein function based on the presence of 
functional domains. 

In the first office action, the Examiner cited Galperin et al. (2000) Nature 
Biotechnology 18:609-613 in support of the argument that a protein's function cannot be 
predicted from the presence of conserved functional domains. This reference was not 
cited in the final Office Action, leading the Appellants to believe that the rejection was 
overcome to the extent that is was based on this reference, however the Examiner has 
cited the reference in the Examiner's Answer and thus the arguments relating to this 
reference will be addressed here. The teachings of Galperin et al are^directed to the 
prediction of protein function using comparative genomic approaches. The abstract for 
the Galperin et al reference states, "[s]everal recently developed computational 
approaches in comparative genomics go beyond sequence comparison. By analyzing 
phylogenetic profiles of protein families, domain fusions, gene adjacency in genomes, 
and expression patterns, these methods predict many functional interactions between 
proteins and help deduce specific functions for numerous proteins." The authors then 
proceed to discuss the strengths and weaknesses of these genomic context-based methods 
of functional prediction. Accordingly, the primary teachings of Galperin et al. are not 
directed to the methods used by the Appellants to predict 2871 function. 

In rebutting Appellants' arguments regarding the Galperin et al, the Examiner 
notes that the authors teach that "sequence comparison methods, even the best ones, are 
of little help when a protein has no homologs in current databases or when all hits are to 
uncharacterized gene products." While Appellants agree that sequence similarity with 
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uncharacterized gene products cannot be used to determine a protein's activity, this caveat 
does not apply to the methods used to determine the function of the 2871 receptor. In the 
present case, the function of the 2871 receptor was determined based on the presence of 
sequence similarity with a conserved functional domain characteristic of the rhodopsin 
family of GPCR's. As described fully in Appellants' Appeal Brief and illustrated in 
Appendix E of the same, this signature pattern was elucidated from the sequences of a 
number of rhodopsin-family GPCR's having known biochemical activities. Accordingly, 
this statement by Galperin et al. does not undermine the reliability of the methods of 
functional prediction used by the Appellants. 

The only additional teachings that Galperin et al provide regarding prediction of 
protein function based on sequence similarity with proteins of known function is also 
supportive of the diagnostic potency and reliability of these methods. On page 613, 
column 1, of the Galperin et al reference, the authors state that comparative genomic 
methods of predicting protein function discussed in the reference "provide a useful 
extension of, and in a sense a genome-based framework for, sequence and structural 
methods which remain the cornerstone of computational genomics." This statement 
demonstrates that Galperin et al. distinguish between the reliability of the comparative 
genomics-based methods of functional prediction reviewed in the reference and the 
pattern based methods for functional prediction used by the Appellants, and further 
demonstrates that the authors consider the approach used by the Appellants to be reliable. 



3. Attwood distinguishes between the reliability of module-based 
prediction of protein function and pattern-based prediction of protein function and 
presents arguments supporting the diagnostic reliability of pattern databases. 

The Examiner's Answer includes a citation to a new reference, Attwood (2000) 
Science 290:471-473, in support of the argument that sequence similarity cannot be used 
to predict protein function. Specifically, the Examiner cites the statement on page 472, 
column 2, of this reference which states, "[i]f the best hit in a database search is a match 
to a single domain module, it is unlikely that the function annotation can be propagated 
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from the parent protein to the query sequence," and f, [t]he presence of a module tells little 
of the function of the complete system; knowing most of the components of a mosaic 
does not allow us easily to predict a missing one , and modules in different proteins do 
not always perform the same function. 11 A careful reading of the Attwood reference 
makes it clear that these statements refer not to the prediction of protein function based 
on the presence of a conserved functional domain methods, but rather to the prediction of 
function based on the presence of a single motif or module. Such modules are defined by 
Attwood as "autonomous folding units that often function as protein building blocks, 
forming multiple combinations of the same module or mosaics of different modules." 
(Attwood, id.) In the present case, the Appellants have determined the function of the 
2871 receptor based on the fact that 255 contiguous amino acids of the 2871 polypeptide 
provide an excellent fit to the Pfam model of the rhodospin family of GPCR f s (see figure 
2). The Pfam model is not based solely on the presence of a single autonomous folding 
unit. 

The differences between the reliability of motif or module-based methods of 

protein function prediction and functional domain-based methods of function prediction 

is discussed in greater detail in Attwood (2000) Int. J. Bichem. Cell Biol 32:139-155 

(provided as Appendix G), a more comprehensive review article published by Attwood in 

the same year as the reference cited by the examiner. In this reference, Attwood teaches 

that while functional prediction methods based on the presence of a single motif may be 

problematic because matches to single motifs lack biological context (see Attwood, id., at 

144), pattern databases such as Pfam overcome many of the flaws inherent in these single 

motif-based methods. Attwood states: 

"[pjattern databases offer several benefits (i) by distilling 
mutiple sequence information into family descriptors, 
trivial errors in the underlying sequences may be diluted; 
(ii) annotation errors may be quickly spotted if the 
description of one sequence differs from that of its family 
and (iii) they allow specific diagnoses, placing individual 
sequences in a family context for a more informed 
assessment of possible function." 
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Attwood, id at 153. 

Attwood also teaches the diagnostic advantages of manually-generated databases 
such as Pfam (which is based on hand-edited seed alignments; see Attwood, id at 149). 
Attwood states, "manually annotated databases are set apart from their automatically 
created counterparts by virtue of (i) providing validation of results and (ii) offering 
detailed information that helps to place conserved sequence information in structural or 
functional contexts." (Attwood, id. at 152). Attwood further states that while pattern 
databases are small in comparison with sequence respositories, "their diagnostic potency 
ensures that pattern databases will pay an increasingly important role as the post-genome 
quest to assign functional information to raw sequence data gains pace." (Attwood, id. at 
153, 154) Thus the teachings by Attwood regarding pattern databases, particulary 
manually-generated pattern databases, are strongly supportive of the reliability of these 
techniques. 

Thus, the Examiner seizes on a single brief review article by Attwood about 
caveats of sequence comparison methods to discredit sequence comparison methods in 
general (Examiner's Answer, page 7, "protein function cannot be ascertained from 
analysis of its components.") Applicants agree generally with Attwood' s argument in the 
new reference cited by the Examiner that predictions of protein function based on a single 
motif are not necessarily reliable. However, those of skill in the art distinguish between 
the presence of a single motif in a protein and the presence of configurations of multiple 
motifs, or a pattern, which is diagnostic of a particular protein family. 

Attwood has published a number of articles describing patterns that are diagnostic 
of G-protein coupled receptors 1 , and is known as one of the creators of the PRINTS 
sequence comparison method and database. Perhaps most pertinent here is an article 
published by Attwood after the article cited by the Examiner, entitled: "A compendium 



1 Attwood's work includes: Attwood and Beck (1994), Protein Eng. 7(7): 841-848, entitled "PRINTS— a 
protein motif fingerprint database"; Attwood and Findlay (1994) Protein Eng. 7(2): 195-203, entitled 
"Fingerprinting G-protein Coupled Receptors"; Attwood et al (1991) Gene 98(2): 153-159, entitled 
"Multiple Sequence Alignment of Protein Families Showing Low Sequence Homology: A Methodological 
Approach Using Database Pattern-matching Discriminators for G-protein-linked Receptors." 
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of specific motifs for diagnosing GPCR subtypes." Attwood (2001) TRENDS in 
Pharmacological Science 22(4): 162-165, provided as Appendix H. In this article, 
Attwood discusses the differences between several sequence comparison methods and 
describes the use of her PRINTS methods and database for the analysis of GPCRs 
(available at http://bioinf.man.ac.uk/cgi- 

bin/dbbrowser/fmgerPRINTScan/muppet/FPScan.cgi, as indicated in Figure 1). See 
Attwood at 164. 

A PRINTS analysis of the closest publicly disclosed polypeptide sequence to the 
subject of the present application (i.e., the sequence disclosed in U.S. Patent No. 
6,063,596 as SEQ ID NO: 3) shows an identification of the "GPCRRHODOPSN" 
fingerprint, with an E- value of 3.1 e' 29 and a P-value of 1.2e" 34 (see output, attached as 
Appendix I). As indicated in the documentation for PRINTS also available at this site, 
"[t]he reported P-value of any fingerprint result is the product of the p-values for each 
motif. The motif p-values represent the probability that a comparison between the motif 
and a random sequence would achieve a score greater than or equal to the score attributed 
to the match between your query sequence and the motif." The E-value is the expected 
number of occurrences of sequences scoring greater than or equal to the query's score. 
Thus, the very low P-value and E-value obtained from Attwood's PRINTS analysis 
concurs with the Pfam diagnosis described by Applicants that the 2871 sequence is a 
GPCR. Accordingly, the Examiner's use of Attwood to discredit sequence comparison 
methods in general is inconsistent with Attwood's work, which strongly supports the 
conclusion that the 2871 sequence is a GPCR. 

4. The Examiner f s failure to credit the predictive power of sequence 
comparison methods is at odds with accepted practice in the art. 

The Examiner notes (Examiner's Answer, paper number 18 mailed August 28, 
2001, page 3) that "the specification discloses that the cloned GPCR shares a high score 
with the seven transmembrane rhodopsin family," and further states on page 4 that "the 
specification notes that proteins with putative seven transmembrane domains, much like 
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applicants, are not necessarily GPCRs such as boss and fz cloned from Drosophila." The 
Examiner also states (Examiner's Answer, page 6-7) that "Figure 2 provides for only the 
DRY triplet and low sequence homology." Based partly on this line of reasoning, the 
Examiner asserts that the specification lacks "a specific and substantial utility [and] a 
well established utility." 

This line of reasoning by the Examiner is inconsistent with the understanding of 
one of skill in the art of Pfam alignments, and of sequence comparisons in general. As 
known to those of skill in the art (and described in the Pfam documentation available at 
http://pfam.wustl.edu/faq.shtml), Pfam alignments do not display homology between 
pairs of sequences but rather display the fit of a particular query sequence to a particular 
protein family model. As discussed on the Pfam "Help Page:FAQ" available at the 
address above, complaints [like the Examiner's present complaint] about the quality of 
the alignments generally arise "because people aren't used to looking at multiple 
alignments of hundreds or thousands of sequences. Remember that a rare insertion in 
even just one sequence [in the protein family] means having to open a gap in the whole 
alignment: Pfam full alignments look very gappy for this reason, but in fact they're not." 

The Examiner also ignores that boss (bride of sevenless) and fz (frizzled) show 
low similarities to GPCR domains in Pfam alignments. One of skill in the art 
understands that Pfam alignments of boss and frizzled with the highest-scoring seven 
transmembrane domain models for each (7tm_3 and 7tm_2, respectively) have negative 
scores. In contrast, the 2871 sequence has a high positive score for the rhodopsin 
subfamily that is described by Pfam model 7tm_l . Pfam "bit scores" represent the log 
base 2 of a ratio. In the numerator of this ratio is the probability of the sequence given 
the hypothesis that the sequence belongs to the protein family being modeled. In the 
denominator of this ratio is the probability of the sequence given the hypothesis that the 
sequence was generated according to a random background model. Thus, the bit score of 
183 for protein 2871 with the Pfam 7tm_l model means this protein sequence is 2 183 
times more likely to be observed if it were generated by the 7tm_l model than if the 
sequence were generated by the other model. We note that 2 183 (about 1 .2 x 10 55 ) greatly 
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exceeds the estimated number of atoms comprised by the planet Earth. In contrast, the 
optimal score for boss to a GPCR family is -53, and the optimal score for frizzled is even 
lower, at -112. In other words, the sequence of boss is 2 53 times more likely to be 
observed if it were generated by the random background model than if it were generated 
by the best- fitting GPCR model. Although 2 53 does not exceed the estimated number of 
atoms that are comprised by the planet Earth, we note that 2 53 is an extremely large 
number (about 9 x 10 15 ). Thus, contrary to the Examiner's arguments, the fact that the 
boss and frizzled proteins have seven transmembrane domains does not detract from 
Applicants' evidence that the sequences of the present invention are GPCRs. 

The Examiner has attacked Applicants' use of sequence comparison methods by 
quoting caveats largely out of context. As one of skill in the art is aware, any 
methodology is fallible to some degree and there are always exceptions to a rule; thus, 
most if not all articles describing sequence comparison methods also discuss the 
shortcomings of those methods. The Examiner seizes on these caveats to discredit the 
use of sequence comparison methods. The Examiner's approach is at odds with that of 
the art, which has embraced sequence comparison methods, particularly as those methods 
have advanced in sophistication with the rapid advances of the genomic era. 

A brief survey of PubMed (accessible at http://www.ncbi.nlm.nih.gov/) shows 
dozens of peer-reviewed, scientific articles published every month describing novel 
discoveries of sequences having strong identity to sequences of known function. The 
acceptance of sequence comparison methods by the art is evidenced in many places. For 
example, Mount (2001) Bioinformatics: Sequence and Genome Analysis (Cold Spring 
Harbor Laboratory Press, Cold Spring Harbor, New York), page 282, provided as 
Appenix J, states that "[d]atabase similarity searches have become a mainstay of 
bioinformatics." Mount goes on to explain that, "[a]s a rough rule, if more than one-half 
of the amino acid sequence of query and database proteins is identical in the sequence 
alignments, the prediction is very strong. As the degree of similarity decreases, 
confidence in the prediction also decreases. The programs used for these database 
searches provide statistical evaluations that serve as a guide for evaluation of the 
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alignment scores." As noted by Gusfield (1997) Algorithms on Strings, Trees, and 

Sequences: Computer Science and Computational Biology (Cambridge University Press, 

New York, New York), provided as Appendix K, at pages 212-213, 

[sjequence comparison, particularly when combined with 

the systematic collection, curation, and search of databases 

containing biomolecular sequences, has become essential in 

modern molecular biology. * * * The first fact of 

biological sequence analysis: In biomolecular sequences 

(DNA, RNA, or amino acid sequences), high sequence 

similarity usually implies significant functional or 

structural similarity. Evolution reuses, builds on, 

duplicates, and modifies "successful" structures (proteins, 

exons, DNA regulatory sequences, morphological features, 

enzymatic pathways, etc.). 
* * * 

'Today, the most powerful method for inferring the 
biological function of a gene (or the protein that it encodes) 
is by sequence similarity searching on protein and DNA 
sequence databases. With the development of rapid 
methods for sequence comparison, both with heuristic 
algorithms and powerful parallel computers, discoveries 
based solely on sequence homology have become routine. 5 
[citation omitted] * * * It is now standard practice, 
whenever a new gene is cloned and sequenced, to translate 
its DNA sequence into an amino acid sequence and then 
search for similarities between it and members of the 
protein databases." 

Another indicator of the importance of sequence comparison methods to the "new 
paradigm" of modem molecular biology is the fact that the most-cited paper of 1990- 
1998 is the publication describing BLAST: Altschul (1990) J. Mol Biol 215: 403, 
entitled "Basic Local Alignment Search Tool," provided as Appendix L. (citation figures 
available at http://www.isinet.com/isi/hot/research). Accordingly, the Examiner's efforts 
to discredit sequence comparison methods in general is inconsistent with the art, which 
supports the use of sequence comparison methods and thus the conclusion that 2871 is a 
GPCR. 



RTA0!/2105372vl 



In re: Glucksmann et al 
Appl.No.: 09/324,465 
Filing Date: June 2, 1999 
Page 12 



B. The scientific evidence presented by the Appellants demonstrates that 
sequence similarity within functional domains is a reliable predictor of protein 
function. 

In the Exainer's Answer, the Examiner presents new arguments addressing the use 
of sequence identity to predict protein function. These new arguments are addressed 
below. 

1 Despite the fact that the histmaine receptor family is divergent, 
members of these families were identified as GPCR's based on sequence similarity with 
known GPCRs. 

In the Appeal Brief, Appellants cited Nguyen et al. (2001) Mol. Pharmacol. 
59:427-433 which describes the identification of the histamine receptor H4 based on 
sequence similarity with known GPCRs. In response, the Examiner has cited the 
teaching by Nguyen et al that the histamine receptors Hi, H2, and H3 share less than 35 
% identity with one another and each has greater identity with other aminergic receptors. 
This statement by Nguyen et al supports rather than discredits the reliability of the 
methods of functional prediciton used by the Appellants, as it demonstrates that the 
activity (in this case the G-protein mediated signal transduction activity) of a protein bean 
be predicted based on sequence identity of less than 35%. 

Despite the fact that histamine receptors share only moderate sequence identity 
with each other, the Hi, H2, H3, and H4 receptors were each recognized as being a G- 
protein coupled receptor having G-protein mediated signal transduction activity based on 
sequence identity. For example, Yamashita et al (1991) Biochem. 88:1 1515-1 1519, 
provided as Appendix M, describe the cloning of the Hi receptor and note that "[t]he 
histamine Hi receptor is highly similar to other G protein-coupled receptors. 1 ' 
(Yamashita et al, id. at 11518). Similarly, Gantz etal (1991) Proc. Natl Acad. Sci. 
88:429-433, provided as Appendix N, describes the cloning the H2 receptor and notes 
that "comparison of the deduced amino acid sequence to that of other G-protein-linked 
receptors with presumed seven- transmembrane motifs revealed extensive homology." 
The H3 histamine receptor was identified and cloned based on a high degree of sequence 
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similarity with biogenic amine GPCRs (Lovenbert et al (1999) Mol Pharmacol 
55:1 101-1 107, provided as Appendix O. Finally, as described in the Appeal Brief, 
Nguyen et al describe the cloning of the H4 receptor based on a query of GenBank to 
identify sequences sharing sequence similarity with GPCRs. Thus, the G-protein 
mediated signal transduction activity of all of the histamine receptors was accurately 
predicted based on sequence similarity with known GPCRs. 



2. The tumor suppressor activity of p73 was predicted based on 
sequence identity with the know tumor suppressor p53. 

In the Appeal Brief, Appellants cite Dickman (1997) Science 277:1605-1606 
which teaches that the tumor suppressor activity of the p73 polypeptide was determined 
based on sequence similarity with the transcription activation, DNA-binding, and 
oligomerization domains of the known tumor suppressor protein p 5 3. In response, the 
Examiner has argued that that Dickman teaches that the p73 gene is deleted in certain 
cancers. However, a careful reading of Dickman finds that the original determination of 
p73 protein's tumor suppression activity was made on the basis of sequence similarity 
alone. Dickman et al. teaches that p73 was identified in a screen for genes that respond 
to certain immune system regulators. Dickman et al states, ,f [w]hen the French team 
sequenced the many potential targets their screen had turned up, they were shocked to 
find out that one false positive had remarkable similarities to p53." (Dickman et al, id., at 
1605). It was only after p73 f s tumor suppression activity had been predicted on the basis 
of sequence similarity with p53 that the investigators thought to look for alterations in the 
p73 gene in cancer patients. 



3. Kliewer et al demonstrate the successful identification of novel 
nuclear receptors based on sequence similarity with the ligand-binding domain of known 
nuclear receptors, and the Examiner's arguments regarding Kliewer et al are based on 
an incorrect understanding of the teachings of this reference. 
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In the Appeal Brief, the Appellants cite Kleiwer et al (1998) CW/:73-82 as an 
additional example of the accurate determination of a protein's function based on the 
presence of functional domains. In rebutting these arguments, the Examiner notes that 
the PXR.l amino acid sequence is identical to the PXR.2 amino acid sequence except for 
a 41 amino acid deletion resulting from alternative splicing. This statement misses the 
point of the reference, which does not teach the isolation of the PXR.2 coding sequence 
based on the PXR.l coding sequence but instead describes the cloning of both the PXR.1 
and PXR.2 coding sequences based on sequence identity with motifs characteristic of 
known nuclear receptors. See page 74 of Kleiwer et al, which states, "[i]n an effort to 
identify new member of the nuclear receptor family, we performed a series of motif 
searches of public EST databases. These searches revealed a clone . . . that had 
homology to the ligand-binding domain of a number of nuclear receptors." The reference 
teaches that this EST was then used to clone the nuclear receptor PXR. 1 and its splice 
variant PXR.2. Accordingly, the reference describes yet another successful use of 
sequence similarity with functional domains to predict protein function. 

II. The 2871 Receptor has Utility 

Applicants again note that these arguments are presented for the first time on 
appeal because the Examiner earlier indicated that the only issue was whether the 
disclosed sequence actually was a GPCR. Now, the Examiner asserts that even if the 
disclosed sequences are GPCRs, utility is not established. Because the Examiner has 
changed the utility rejection, Applicants have not had the opportunity to fully address the 
Examiner's arguments. Applicants here present these arguments in response to the 
Examiner's new and revived grounds of rejection. 

A. The 2871 receptor is useful in selectivity screening and therefore has a 
"well-established" utility. 

The Examiner has rejected claims 2, 9-14, 18-20, 22-30, and 33-37 under 35 
U.S.C. §101 on the grounds that the claimed invention "lacks patentable utility. 11 (Feb. 
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12, 2001 Office action page 3). This does not correctly reflect the view in the art, where 
it is known that "[historically, the superfamily of GPCRs has proven to be among the 
most successful drug targets and consequently these newly isolated orphan receptors have 
great potential for pioneer drug discovery." Stadel et al (1997) Trends Pharmacol Scl 
18:430-436; provided as Appendix P). Those of skill in the art recognize that the 
identification of a novel member of the G-protein coupled receptor family provides an 
immediate benefit. In addition to serving as reagents and targets in the diagnosis and 
treatment of 2871 -mediated disorders as described in the specification on page 48 et seq., 
all members of the GPCR protein family have utility in selectivity screening of candidate 
drugs that target GPCRs. It is known in the art that the clinical usefulness of a 
therapeutic compound is determined not only by its ability to bind and modulate a 
molecular target of interest, but also by its selectivity. Drugs that bind selectively to their 
molecular target are highly preferred over those that bind to structurally-related 
molecules, as the selective compounds are far less likely to have unwanted side effects in 
clinical use. See, for example, Hartig (1993) NIDA Res. Monogr. 134: 58-65, entitled, 
"The use of cloned human receptors for drug design," provided as Appendix Q; Fraser 
(1995) J. Nucl. Med. 36 (6 Suppl): 17S-21S, provided as Appendix R. Thus, an 
important component of any drug development strategy is determining the selectivity of 
the candidate drug for the molecular target of interest over structurally-related 
polypeptides. The effectiveness of selectivity screening in uncovering interactions that 
may result in undesirable clinical side-effects increases in proportion with the number of 
structurally-related polypeptides screened. In this situation, the usefulness of these 
structurally-related polypeptides is not dependent on their biological role or ligand- 
binding properties; their utility comes from the fact that they share significant sequence 
identity with the molecular target of the candidate drug. 

An example of the use of orphan receptors in selectivity screening is found in 
Goodwin et al (2000) Molecular Cell 6:517-526, a copy of which is provided as 
Appendix S. This reference is directed to the identification of a specific agonist for FXR, 
an orphan nuclear receptor that regulates bile acid synthesis and is a target in the 
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treatment of cholestasis. (See generally, Niesor et al (2001) Curr. Pharm. Des. 7: 231- 
259). Goodwin states that many previously-identified FXR ligands interact with other 
proteins including bile-acid-binding proteins and transporters (Goodwin at page 518, 
column 1). In order to identify a compound that selectively modulates FXR, the authors 
screened for compounds that modulated FXR activity and then tested these compounds 
for their ability to activate other nuclear receptors that share structural similarity with 
FXR. Figure 1C of Goodwin shows that the compound GW4064 potently activates FXR 
but does not modulate the activity of the other nuclear receptors tested. Note that the 
nuclear receptor panel screened in Figure 1C includes the orphan nuclear receptors SHP- 
1 and LRH-1 in addition to receptors having previously-identified ligands, illustrating 
that studies often include orphan receptors. 

More than 50% of prescription drugs act at GPCR targets, further showing the 
importance of GPCRs in screens for effective drugs. However, some of these drugs have 
efficacy problems and limiting side-effects because the compounds do not differentiate 
between receptor subtypes. See generally, Stadel et al, (1997) Trends Pharmacol Sci. 
18: 430; Lee and Kerlavage (1993) Molecular Biology of G-Protein-Coupled Receptors, 
6 DN&P 488. Accordingly, because the GPCR protein family includes a number of key 
drug targets, members of this family share a common use in the selectivity screening of 
candidate drugs. The 2871 receptor shares a high degree of identity with the rhodopsin 
family of GPCRs (see specification Figure 2). This rhodopsin GPCR family includes 
targets for the treatment of numerous disorders including depression, anxiety, migraine, 
asthma, hypertension, and cardiovascular disorders. Thus, all members of this important 
class of GPCRs, including those disclosed in the present invention, have a specific, 
immediately available, real world utility in the selectivity screening of drugs directed at 
GPCR targets. 

The therapeutic and economic benefits that can result from selectivity screening 
are well known. One example is the events of 1994-1997 leading to Merck's marketing 
of the painkiller Vioxx, described in Gardiner Harris, The Cure: With Big Drugs Dying, 
Merck Didn 7 Merge— It Found New Ones, The Wall Street Journal, January 1 0, 2001 , at 
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Al, provided as Appendix T. Merck's search for a novel pharmacologically suitable 
painkiller made use of in vitro screens to find drugs that inhibited the activity of Cox-2 
but not Cox-1 . Such drugs would inhibit prostaglandin production in most of the body 
but not the gut, thereby ameliorating pain while avoiding undesirable side effects. 
Candidate drugs from a collection of hundreds of synthesized drugs were first subjected 
to in vitro screening; a much smaller number of successful in vitro candidates advanced 
to in vivo screening in mice, and two successful nontoxic drugs from the mouse in vivo 
screens were advanced to even more expensive human clinical trials. Only one of these 
two drugs showed efficacy in clinical trials, ultimately received FDA approval, and is 
now being marketed as Vioxx. This example illustrates how a "real world" benefit can 
be obtained from distinguishing gene family members. 

B. The 2871 sequence has a high degree of identity to other sequences 
that have utility; therefore, the 2871 sequence has utility. 

The USPTO utility examination guidelines state, "[w]hen a class of proteins is 
defined such that the members share a specific, substantial, and credible utility, the 
reasonable assignment of a new protein to the class of sufficiently conserved proteins 
would impute the same specific, substantial, and credible utility to the assigned protein." 
66 Fed. Reg. 1096. In the present application, Applicants have demonstrated that the 
2871 receptor is a member of the rhodopsin family of G-protein coupled receptors. 
Members of this family of receptors are known by those of skill in the art to share a 
specific, substantial, and credible utility. In fact, it has come to our attention that a U.S. 
patent has issued from an international application disclosed by Applicant in the 
Supplemental IDS returned by the Examiner with paper 8 (the Office Action mailed 
8/25/00). In U.S. Patent No. 6,063,596, (the £ 596 patent) with inventors Lai et ah and 
assigned to Incyte Pharmaceuticals, issued 16 May 2000, one of the disclosed sequences 
has 98% identity to Applicant's 2871 sequence. The claimed invention of the '596 patent 
is described as providing human G-protein coupled receptors associated with immune 
response. Applicants' present claims are directed to methods of using the 2871 sequence 
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of the present invention. Because there is an issued U.S. patent with claims to sequences 
with a high degree of identity to Applicant's 2871 sequences, the Patent Office must have 
found these sequences to have utility. Accordingly, a rejection of Applicants' present 
claims for lack of utility is inappropriate and should be withdrawn. 

C. The identification of the 2871 ligand or cellular function is not a 
requirement for establishing the utility of this receptor. 

The Examiner has stated that the specification does not provide "any evidence or 
guidance suggesting the claimed protein's activity" (Examiner's Answer at page 3) and 
that therefore doubt is cast on "whether the nucleotide sequence or its encoded protein 
can be used in any of applicants asserted utilities." (emphasis added; Examiner's Answer 
at page 4). Applicants disagree. As discussed in the specification and known in the art, 
GPCRs (G-protein coupled receptors) are responsible for G-protein mediated signal 
transduction. "GPCRs, along with G proteins and. . .intracellular enzymes and channels 
modulated by G-proteins, are the components of a modular signaling system that 
connects the state of intracellular second messengers to extracellular inputs." 
(specification page 2; see also pp. 6, 7, 20). 

While the Examiner's assertion of lack of utility may reflect the thinking of the 
pre-genomics era, it does not accurately describe the current state of the art in drug 
discovery. Those of skill in the art appreciate that rapid advances in technology have led 
to dramatic changes in the way research is conducted in many biomedical-related areas. 
"Molecular biology has had a dramatic influence" on active drug discovery and research 
projects in the pharmaceutical industry, particularly those involving GPCRs. See Stadel 
et al (1997) Trends Pharmacol Sci. 18:430-436; provided as Appendix P). The 
advances in molecular biology have led to what those in the art consider a "paradigm 
shift" in the way research and drug discovery is conducted. Id. In this new paradigm, the 
starting point in the process is the identification of new members of gene families such as 
the GPCR superfamily by "computational or bioinformatic methodologies." Stadel at 
430. "Once new members of the GPCR superfamily are identified, the recombinantly 
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expressed receptors are used in functional assays to search for the associated novel 
ligands. The receptor-ligand pair are then used for compound bank screening to identify 
a lead compound that, together with the activating ligand, is used for biological and 
pathophysiological studies to determine the function and potential therapeutic value of a 
receptor antagonist (or agonist) in ameliorating a disease process." Stadel at 434; see 
also Fraser (1995), J. Nucl. Med. 36 (6 Suppl): 17S-21S. Often, these screens are 
implemented in high-throughput format. See id. Thus, in the molecular biology field of 
the present invention, the discovery of a novel sequence is the key step, or "first link" of 
Cross. See, Cross v. Iizuka, 753 F.2d 1040, 1051 (Fed. Cir. 1985) (holding that "[w]e 
perceive no insurmountable difficulty, under appropriate circumstances, in finding that 
the first link in the screening chain, in vitro testing, may establish a practical utility for 
the compound in question.") 

Similarly, in drug development, the key step or "first link" is the discovery of a 
novel sequence such as that of the present invention; subsequent screening steps are 
routinely performed. As those in the art note, "the potential reward of using this 
["reverse molecular pharmacological strategy"] approach is that resultant drugs naturally 
will be pioneer or innovative discoveries, and a significant proportion of these unique 
drugs may be useful to treat diseases for which existing therapies are lacking or 
insufficient." Stadel at 434. 

Because more than 50% of prescription drugs act at GPCR targets, members of 
this family share a common use drug screening. The 2871 receptor shares a high degree 
of identity with the rhodopsin family of GPCRs and is expressed in tissues including 
those of particular clinical significance to immune disorders, such as T-helper cells (see 
Figure 5 and specification page 8, lines 25 et seq. and page 58, lines 21-24 ). Examples 
of disorders for which expression of the 2871 gene is relevant include disorders 
associated with expression of IL-4, IL-5 or of other cytokines, disorders associated with 
helper T-cell differentiation to Thl versus Th2 cells, disorders associated with cellular 
immune responses, and disorders involving cytokine-mediated proinflammatory actions. 
See page 10, lines 1-15 of the specification. Accordingly, this receptor has a specific, 
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immediately available, real world utility in the selectivity screening of drugs directed at 
GPCR targets. 

D. The rejection of the claims under 35 ILS.C. §101 and §112, first 
paragraph, is inconsistent with USPTO guidelines and supporting case law. 

The Utility Examination Guidelines state, "Applicants] need only provide one 
credible assertion of specific and substantial utility for each claimed invention to satisfy 
the utility requirement." 66 Fed. Reg. 1098. This one-utility requirement is consistent 
with Cross, which held that "[w]hen a properly claimed invention meets at least one 
stated objective, utility under §101 is clearly shown" Cross, 753 F.2d at 1046 fh9 5 citing 
Raytheon Co. v. Roper Corp. 724 F.2d 951, 958 (Fed. Cir. 1983), cert, denied, 469 U.S. 
835 (1984). Thus, the Examiner's utility rejection depends on the invalidity of each of 
Applicants' asserted uses. However, as the Examiner noted (at page 3 of the Office 
Action mailed February 12, 2001 (paper 1 1)), "applicants do indeed provide multiple 
well-established and specific utilities for a GPCR." Inexplicably, the Examiner now 
states (at page 5 of the Examiner's Answer mailed August 28, 2001 (paper 18)) that 
"since there was no specific and substantial asserted utility or a well-established utility 
for the claimed nucleic acids and encoded proteins, credibility of the utility was not 
assessed." 

The PTO guidelines state, "[a] rejection based on lack of utility should not be 
maintained if an asserted utility for the claimed invention would be considered specific, 
substantial, and credible by a person of ordinary skill in the art in view of all evidence of 
record." 66 Fed. Reg. 1098). "Credibility is assessed from the perspective of one of 
ordinary skill in the art in view of the disclosure. ..." 66 Fed. Reg. 1098. As the 
Examiner noted (at page 3 of the Office Action mailed Feb. 12, 2001 (paper 11)), 
Applicants "do indeed provide multiple well-established and specific utilities for a 
GPCR," and one of ordinary skill in the art would agree with the Examiner that the 
present invention satisfies the utility standard. 

The PTO utility examination guidelines also state, 
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[w]here the asserted utility is not specific or substantial, a 
prima facie showing [of no specific and substantial credible 
utility] must establish that it is more likely than not that a 
person of ordinary skill in the art would not consider that 
any utility asserted by the Applicants would be specific and 
substantial. The prima facie showing must contain the 
following elements: (1) An explanation that clearly sets 
forth the reasoning used in concluding that the asserted 
utility for the claimed is not both specific and substantial 
nor well-established; (2) Support for factual findings relied 
upon in reaching this conclusion; and (3) An evaluation of 
all relevant evidence of record, including utilities taught in 
the closest prior art. 

(66 Fed. Reg. 1098). Further, "[ojffice personnel are reminded that they must treat as 
true a statement of fact made by Applicants in relation to an asserted utility, unless 
countervailing evidence can be provided that shows that one of ordinary skill in the art 
would have a legitimate basis to doubt the credibility of such a statement" (66 Fed. Reg. 
1098-99). 

This provision is consistent with the case law. See, In re Gazave, 379 F.2d 973 
(C.C.P.A. 1967) (finding that the utility standard was met where "appellant's assertions 
of usefulness in his specification appear to be believable on their face and 
straightforward, at least in the absence of reason or authority in variance"); Ex parte 
Dash, 27 U.S.P.Q.2d 1481, 1484 (Bd. Pat. App. & Int'f 1993) (holding that "[a] 
disclosure of a utility satisfies the utility requirement of section 101 unless there are 
reasons for the artisan to question the truth of such disclosure.") Similarly, in In re 
Jolles, claims to pharmaceutical compounds and methods of use were rejected under 
§101 and §112. The court held, "it is proper for the examiner to ask for substantiating 
evidence unless one with ordinary skill in the art would accept the allegations as 
obviously correct" (628 F.2d 1322, 1327 (C.C.P.A. 1980)). See also, In re Brana, 51 
F.3d 1560, 1563 (Fed. Cir. 1995) (stating that "[o]nly after the PTO provides evidence 
showing that one of ordinary skill in the art would reasonably doubt the asserted utility 
does the burden shift to the Applicants to provide rebuttal evidence sufficient to convince 
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such a person of the invention's asserted utility," and holding that the PTO did not meet 
this burden.) 

In the present case, the utility rejection has not been supported in the required 
manner. As discussed above, the Examiner's objections are not properly grounded in the 
authority cited and are in fact inconsistent with practices in the art. Accordingly, the 
Examiner has not made a prima facie showing of no utility and the rejection should be 
withdrawn. 



CONCLUSION 

In view of the arguments presented above, Applicants contend that each of claims 
2, 9-14, 18-20, 22-30, and 33-37 is patentable. Therefore, reversal of the rejections under 
35 U.S.C. § 101 and 35 U.S.C. § 112, first paragraph, is respectfully solicited. 

Respectfully submitted, 
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APPEALED CLAIMS 

2. An isolated antibody that selectively binds to a polypeptide having an 
ino acid sequence selected from the group consisting of: 

(a) the amino acid sequence shown in SEQ ID NO:l; and 

(b) the amino acid sequence encoded by the cDNA contained in ATCC 
Deposit No. PTA-2369. 

9. A method for detecting the presence of a polypeptide having an amino 
acid sequence selected from the group consisting of: 

(a) the amino acid sequence shown in SEQ ID NO:l; and 

(b) the amino acid sequence encoded by the cDNA contained in 
ATCC Deposit No. PTA-2369; 

said method comprising contacting said sample with an agent that specifically allows 
detection of the presence of the polypeptide in the sample and then detecting the presence of 
the polypeptide. 

10. The method of claim 9, wherein said agent is capable of selective physical 
association with said polypeptide. 

11. The method of claim 10, wherein said agent binds to said polypeptide. 

12. The method of claim 11, wherein said agent is an antibody. 

13. The method of claim 11, wherein said agent is a ligand. 

14. A kit comprising reagents used for the method of claim 9, wherein the 
reagents comprise an agent that specifically binds to said polypeptide. 
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18. A method for identifying an agent that binds to a polypeptide having an 
amino acid sequence selected from the group consisting of: 

(a) the amino acid sequence shown in SEQ ED NO: 1 ; and 

(b) the amino acid sequence encoded by the cDNA contained in 
ATCC Deposit No. PTA-2369; 



said method comprising contacting the polypeptide with an agent that binds to the 
polypeptide and assaying the complex formed with the agent bound to the polypeptide. 

19. A method for modulating the activity of a polypeptide having an amino 
acid sequence selected from the group consisting of: 

(a) the amino acid sequence shown in SEQ ID NO: 1 ; and 

(b) the amino acid sequence encoded by the cDNA contained in 
ATCC Deposit No. PTA-2369; 

said method comprising contacting the polypeptide with an agent under conditions that 
allow the agent to modulate the activity of the polypeptide. 

20. The method of claim 19 wherein the activity is modulated in a subject 
with an inflammatory disorder. 

22. A method for identifying a compound that binds to a polypeptide 
having an amino acid sequence of SEQ ID NO:l, said method comprising the steps 
of: 

(a) contacting a polypeptide, or a cell expressing said with a test 
compound; and 

(b) determining whether the polypeptide binds to the test compound. 

23. The method of claim 22, wherein the binding of the test compound to 
the polypeptide is detected by a method selected from the group consisting of: 
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(a) detection of binding by direct detecting of test 
compound/polypeptide binding; 

(b) detection of binding using a competition binding assay; and 

(c) detection of binding using an assay for GPCR-like-mediated 
signal transduction. 

24. A method for screening a cell to identify an agent that binds with 

a polypeptide having an amino acid sequence shown in SEQ ID NO:l in said cell, 
said method comprising contacting said cell with an agent and detecting an interaction 
between said polypeptide and said agent. 

25. A method for screening a cell to identify an agent that modulates the 
expression level or activity of the polypeptide having an amino acid sequence 

in SEQ ID NO: 1 in a cell, said method comprising contacting said cell with an agent 
and measuring the level or activity of said polypeptide. 

26. The method of claim 25, wherein said cell is an immune cell. 

27. The method of claim 25, wherein said agent increases the level or 
activity of said polypeptide. 

28. The method of claim 25, wherein said agent decreases the level or 
activity of said polypeptide. 

29. A method for modulating the activity of a polypeptide having an amino 
acid sequence shown in SEQ ID NO:l in a cell comprising contacting said cell with a 
compound that binds to said polypeptide in a sufficient concentration to modulate the 
activity of the polypeptide. 



30. 
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with an immune disorder. 

33. A method for modulating G-protein coupled receptor expression 
in disease states of a patient, comprising contacting a tissue from said patient with 
an isolated antibody that selectively binds to the polypeptide having an amino acid 
sequence shown in SEQ ID NO:l in a sufficient concentration to modulate G-protein 
coupled receptor expression. 

34. The method of claim 33, wherein the G-protein coupled receptor 
expression is involved in signal transduction. 

35. The method of claim 33, wherein the G-protein coupled receptor 
expression is involved in immunity. 

36. The method of claim 35, wherein the G-protein coupled receptor 
expression is involved in cytokine production. 

37. The method of claim 36, wherein the G-protein coupled receptor 
expression is involved in IL-4 and IL-5 expression. 
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Abstract 



In the wake of the numerous now- fruitful genome projects, we have witnessed a 'tsunami' of sequence data and 
with it the birth of the field of bioinformatics. Bioinformatics involves the application of information technology to 
the management and analysis of biological data. For many of us, this means that databases and their search tools 
have become an essential part of the research environment. However, the rate of sequence generation and the 
haphazard proliferation of databases have made it difficult to keep pace with developments, even for the 
cognoscenti. Moreover, increasing amounts of sequence information do not necessarily equate with an increase in 
knowledge, and in the panic to automate the route from raw data to biological insight, we may be generating and 
propagating innumerable errors in our precious databases. In the genome era upon us, researchers want rapid, easy- 
to-use, reliable tools for functional characterisation of newly determined sequences. For the pharmaceutical industry 
in particular, the Pandora's box of bioinformatics harbours an information-rich nugget, ripe with potential drug 
targets and possible new avenues for the development of therapeutic agents. This review outlines the current status 
of the major pattern databases now used routinely in the analysis of protein sequences. The review is divided into 
three main sections. In the first, commonly used terms are defined and the methods behind the databases are briefly 
described; in the second, the structure and content of the principal pattern databases are discussed; and in the final 
part, several alignment databases, which are frequently confused with pattern databases, are mentioned. For the 
new-comer, the array of resources, the range of methods behind them and the different tools required to search 
them can be confusing. The review therefore also briefly mentions a current international endeavour to integrate the 
diverse databases, which effort should facilitate sequence analysis in the future. This is particularly important for 
target-discovery programmes, where the challenge is to rationalise the enormous numbers of potential targets 
generated by sequence database searches. This problem may be addressed, at least in part, by reducing search 
outputs to the more focused and manageable subsets suggested by searches of integrated groups of family-specific 
pattern databases. © 2000 Elsevier Science Ltd. All rights reserved. 

Keywords: Bioinformatics; Similarity search; Sequence alignment; Pattern recognition; Function annotation 
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1. Introduction 

Ten years from the dawn of the field of bio- 
inforrnatics, we are harvesting the abundant 
fruits of a variety of genome projects and, in 
spite of early flood warnings, the resultant tor- 
rent of sequence information has all but broken 
our databanks. Biological databases are now a 
central part of the research environment, but 
many have evolved simply as a by-product of a 
particular individual's research project, with no 
thought that they might one day become valuable 
international treasures. Consequently, some have 
not stood the test of time (most do not survive 
beyond the first five years [1]). Others are creak- 
ing under the strain of information overload, 
their underlying technologies never having been 
designed to cope with such volumes of data. Still 
others have managed to survive via collaborative 
efforts, some on an international scale. The pro- 
tein sequence database (PSD), for example, 
evolved in the early 1960s from Margaret 
Dayhoff's research on the evolutionary relation- 
ships among proteins [2]. By 1980, the collection 
had grown to (a mere) 200 sequences [3], which 
in the last two decades has increased more than 
600 fold to -131,000 (release 61, June 1999). The 
PSD is now maintained collaboratively by PIR- 
International [4] and is one of the most compre- 
hensive protein sequence collections currently 
available. 

Today, there are hundreds of databanks 
around the world housing information at the 
levels of the genome, the proteome and even the 
metabolome [5]. The endeavour to cope with and 
rationalise these vast quantities of data has 
required global co-operation and ever increasing 
levels of automation in data handling and analy- 
sis. However, automation carries a price. In the 
field of genomics, for example, although software 
robots are essential to the process of functional 
annotation of newly determined sequences, they 
pose a threat to information quality because they 
can introduce and propagate mis-annotations [6]. 
The curators are aware of this and always strive 
to improve the quality of their resources, but 
databases are nevertheless historical products (or, 



in some cases, historical accidents!) and are there- 
fore far from perfect. To get the most from cur- 
rent biological databases it is thus important to 
have an understanding both of their powers and 
of their pitfalls. 

The first step towards functional characteris- 
ation of a new sequence usually involves trawling 
a sequence database with tools such as BLAST 
[7) or FASTA [8]. Such searches quickly reveal 
similarities between the query and a range of 
database sequences. The trick then lies in the re- 
liable inference of homology (the verification of a 
divergent evolutionary relationship) and, from 
this, the inference of function. Ideally, a search 
output will show unequivocal similarity to a well- 
characterised protein over the full length of the 
query. At worst, an output will reveal no signifi- 
cant hits, but the usual scenario is a list of partial 
matches to diverse proteins, many of them 
uncharacterised and some with dubious or con- 
tradictory annotations [9]. 

There are various reasons for this sort of con- 
fusion. For example, the increasing size of 
sequence databases and their population by 
greater numbers of poorer quality partial 
sequences (such as expressed sequence tags), 
gives rise to an increasing likelihood that high- 
scoring matches will be made to a query simply 
by chance. So-called low-complexity matches, in 
particular, may swamp search outputs — these 
are regions within a sequence that have high den- 
sities of particular residues (e.g. poly-GxP, such 
as occurs in repetitive, often tightly structured 
sequences like collagen; or polyglutamine tracts 
that occur in Huntingdon's disease protein; and 
so on). Although mechanisms are available for 
masking such sequences, their incautious use may 
also create complications. The modular and/or 
domain nature of many proteins also causes pro- 
blems on different levels. First, when matching 
multidomain proteins, it may not be clear which 
domain or domains correctly correspond to the 
query. Second, even if the right domain has been 
identified, it may not be appropriate to transfer 
the functional annotation to the query because 
the function of the matched domain may be 
different, depending on its precise biological con- 
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Fig. 1. At the heart of sequence analysis methods is the multiple sequence alignment. Application of these methods involves the 
derivation of some kind of representation of conserved features of the alignment, which may be diagnostic of structure or function. 
Various terms are used to describe the different types of data representation, as shown. Within a single conserved region (motif), 
the sequence information may be reduced to a single consensus expression (a regular expression), often simply referred to as a pat- 
tern. In this example, square brackets indicate residues that are allowed at this position of the motif and x denotes any residue, the 
(2) indicating that any residue can occupy consecutive positions in the motif. The term used to describe groups of motifs in which 
all the residue information is retained within a set of frequency (identity) matrices is a fingerprint. Adding a scoring scheme to such 
sets of frequency matrices results in position-specific weight matrices, or blocks. Using information from extended conserved 
regions that include gaps (usually referred to as domains) gives rise to profiles; and probabilistic models derived from alignment 
profiles are termed hidden Markov models. 



text. Similar issues arise with the existence of 
multigene families, because database search tech- 
niques cannot differentiate between a matched 
orthologue (the functional counterpart of a 
sequence in another species) and a matched para- 
logue (a homologue that performs different but 
related functions within the same organism). 

Achieving consistent, reliable functional assign- 
ments can be a complicated process. As a result, 
in addition to routine searches of the sequence 
databases, it is now customary to extend search 



strategies to include a range of Value-added' or 
pattern databases. These distil information within 
groups of related sequences into potent descrip- 
tors or discriminators that aid family diagnosis. 
Searching pattern databases is more sensitive and 
selective than sequence database searching 
because derived family discriminators can detect 
weaker regions of similarity. Different analytical 
approaches have been used to create a bewilder- 
ing array of discriminators, which are variously 
termed regular expressions, rules, profiles, signa- 
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Table 1 

Web addresses of pattern and alignment databases in com- 
mon use; for a more exhaustive list, refer to the annual data- 
base issue of Nucleic Acids Research (http://www3.oup.co.uk/ 
nar/) 



PROSITE 


http://www.expasy.ch/prosite/ 


BLOCKS 


http://www.blocks.fhcrc.org/ 


PRINTS 


h ttp : //www . bio in f . man . ac . uk /dbbrowser/ 




PRINTS/ 




IDENTIFY 


http:// 


dna.Stanford.EDU/identify/ 


Profiles 


http://www.isrec.isb-sib.ch/software/ 




PFSCAN_form.html 


Pfam 


http://www.sanger.ac.uk/Software/Pfam/ 


ProDom 


http://www.toulouse.inra.fr/prodom.html 


SBASE 


http://www.icgeb.trieste.it/sbase/ 


PIR-ALN 


http://www-nbrf.georgetown.edu/pirwww/search/ 




textpiraln.htnil 


PROT- 


http://vms.mips.biochem.mpg.de/mips/programs/ 


FAM 


classification.html 


DOMO 


http://www.infobiogen.fr/ ~ gracy/domo/ 


ProClass 


http://pir.georgetown.edu/gfserver/proclass.html 


ProtoMap 


http://www.protomap.cs.huji.ac.il/ 


PIMA 


http://dot.imgen.bcm.tmc.edu:9331/seq-search/ 




protein- search, html 


ProWeb 


http://www.proweb.org/kinesin/ProWeb.htrnl 



tures, fingerprints, blocks, etc. [10] — these terms 
are summarised in Fig. 1. The different descrip- 
tors have different diagnostic strengths and weak- 
nesses and different areas of optimum application 
and have been used to generate different pattern 
databases, which also tend to differ in content! 
The aim of this review is to provide an overview 
of the current status of pattern and alignment 
databases in common use and to provide pointers 
on how best to use them. As this is a rapidly 
developing area, a list of Web addresses is given 
in Table 1 to allow readers to obtain the most 
up-to-date information on the resources dis- 
cussed. 



2. The methods behind the databases 

At the heart of the analysis methods that 
underpin pattern databases is the multiple 
sequence alignment. When building an alignment, 
as more distantly related sequences are included, 
insertions are often required to bring equivalent 



parts of adjacent sequences into the correct regis- 
ter, as illustrated schematically in Fig. 1. As a 
result of this gap insertion process, islands of 
conservation emerge from a backdrop of muta- 
tional change. These conserved regions (typically 
around 10-20 amino acids in length) tend to cor- 
respond to the core structural or functional el- 
ements of the protein; they are most commonly 
termed motifs, but are also referred to as blocks, 
segments or features. 

Several techniques have evolved to exploit the 
conservation encoded in sequence alignments, as 
shown in Fig. 2 [11]. Broadly, the methods fall 
into three categories, depending on whether they 
use single motifs, multiple motifs or full domain 
alignments. Whatever the approach, all involve 
the derivation of some kind of discriminatory 
representation of the conserved alignment el- 
ements — essentially, the conserved motifs pro- 
vide a characteristic signature or fingerprint for 
the family, which can be used to facilitate diag- 
nosis of future query sequences. 

The diagnostic success of the different methods 
depends on how reliably true family members 
(true positives) can be distinguished from non- 
family members (true negatives). In practice, 
there is a crucial balance between the number of 
incorrect matches that are made (false positives) 
and the number of correct matches that are 
missed (false negatives) at a given scoring 
threshold. As shown in Fig. 3, for a given search, 
the distribution of true positive matches must be 
resolved from that of the true negatives, such 
that the overlap between them is minimised or 
eliminated. This is important because, for 
matches in the overlapping area, it can be diffi- 
cult or impossible to determine which are correct 
(statistical approaches are used to assign confi- 
dence levels to matches in this area, but math- 
ematical significance does not give biological 
proof). The different analytical methods that 
have been designed to improve the resolving 
power of database searches are outlined below. 

2. 1 . Single-motif methods 

Of the various approaches, single-motif (regu- 
lar expression pattern) methods are easiest to 
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Fig. 2. Illustration of the three principal methods for building pattern databases: i.e. using single motifs, multiple motifs and full 
domain alignments. Single-motif (regular expression pattern) approaches have given rise to the PROSITE and IDENTIFY data- 
bases; multiple- motif methods have spawned the BLOCKS and PRINTS databases; and domain alignment methods have resulted 
in the Profiles and Pfam resources. 



understand. The idea is that a particular protein 
family can be characterised by the single most 
conserved, often functionally important, region 
(e.g. an enzyme active site) observed in a 
sequence alignment. The motif is reduced to a 
consensus expression in which all but the most 



significant residue information is discarded. For 
example, the short expression D-[ALV]-x-{YW}- 
T means that a conserved aspartic acid (D) resi- 
due is followed by a hydrophobic residue, which 
may be alanine (A), leucine (L) or valine (V); this 
is followed by an arbitrary residue (x) and any 
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Number 
of matches 




Threshold 



Fig. 3. Resolving true and false matches. In a database search, 
the desire is to establish which sequences are related to the 
query (i.e. are true positive) and which are unrelated (true 
negative). At a given scoring threshold, it is likely that several 
unrelated sequences will match erroneously (so-called false 
positives) and several correct matches will fail to be diagnosed 
(false negatives). In sequence analysis, the challenge is to 
improve diagnostic performance by capturing all (or the ma- 
jority) of true positive family members, including no (or few) 
false positives and minimising or precluding false negatives. 

residue except tyrosine (Y) or tryptophan (W); 
and finally a conserved threonine (T). No other 
residues or residue combinations are tolerated by 
the expression; matches to it must therefore be 
exact, or will be disregarded. 

So rigid is this syntax that regular expression 
patterns do not perform well when used to rep- 
resent highly divergent protein families. For 
example, these patterns will fail to match signifi- 
cant sequences if they contain a single amino 
acid difference — hence, the sequence DAMYT 
is a mis-match, in spite of matching the above ex- 
pression in all but one position (it has a forbid- 
den tyrosine as its fourth residue). Conversely, a 
pattern will match anything that corresponds to 
it exactly, regardless of whether it is a true family 
member. The problem is that matches to single 
motifs lack biological context — a match to a 
pattern is just a match to a pattern and may well 
only be fortuitous. To assess the likelihood of a 
match being 'real', it must be verified with corro- 
borating evidence, whether via other database 
searches, the literature, experiment, etc. 



Table 2 

Overlapping ^ets of amino acids and their properties; these 
are used to create the permissive regular expressions used as 
the basis of the IDENTIFY resource 



Residue property 


Residue groups 


Small 


Ala, Gly 


Small hydroxyl 


Ser, Thr 


Basic 


Lys, Arg 


Aromatic 


Phe, Tyr, Trp 


Basic 


His, Lys, Arg 


Small hydrophobic 


Val, Leu, He 


Medium hydrophobic 


Val, Leu, He, Met 


Acidic/amide 


Asp, Glu, Asn, Gin 


Small/polar 


Ala, Gly, Ser, Thr, Pro 



An approach that addresses the strict nature of 
exact regular expression matching is to assign 
amino acid residues to distinct, but overlapping, 
substitution groups corresponding to various bio- 
chemical properties (e.g. charge and size), as 
shown in Table 2. This is a biologically sensible 
approach because each amino acid has several 
properties and can serve different functions, 
depending on its biochemical context [12]. 
However, although the technique is more flexible, 
its inherent permissiveness brings with it an inevi- 
table signal-to-noise trade-off — i.e. resulting 
patterns not only have the potential to make 
more true positive matches, but they will conse- 
quently also match more false positives. For 
example, the sequence DAMPS, which would be 
excluded by the exact regular expression above, 
would be matched by the permissive one (because 
Ser and Thr belong to the same group), even if 
threonine were biologically mandatory at the last 
position of the motif. 

2.2. Multiple-motif methods 

In response to the problems inherent in single- 
motif methods, diagnostic techniques sub- 
sequently evolved to exploit multiple motifs. 
Within a sequence alignment, it is usual to find 
not one, but several motifs that characterise the 
aligned family. Diagnostically, it makes sense to 
use many or all such conserved regions to build a 
family signature or fingerprint. In a database 
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search, there is then a greater chance of identify- 
ing a distant relative, whether or not all parts of 
the signature are matched. For example, a 
sequence that matches only four of seven motifs 
may still be diagnosed as a true match if the 
motifs are matched in the correct order in the 
sequence and the distances between them are 
consistent with those expected of true neighbour- 
ing motifs. The ability to tolerate mis-matches, 
both at the level of individual residues within 
motifs and at the level of motifs within the com- 
plete signature, renders multiple-motif matching 
a powerful diagnostic approach. 

Different multiple-motif methods have arisen, 
depending both on the technique used to detect 
the motifs and on the scoring method employed. 
Probably the simplest to understand is the tech- 
nique of fingerprinting [13]. Here, groups of con- 
served motifs are excised from a sequence 
alignment and used to create a series of fre- 
quency (identity) matrices — no mutation or 
other similarity data are used to weight the 
results. The scoring scheme is thus based on the 
calculation of residue frequencies for each pos- 
ition in the motifs, summing the scores of identi- 
cal residues for each position of a retrieved 
match. However, the main strength of this 
approach also gives rise to its main weakness. In 
other words, because the method exploits 
observed residue frequencies, the scoring matrices 
are sparse and thus perform cleanly (with little 
noise) and with high specificity; at the same time, 
their absolute scoring potential is limited by the 
nature of the observed data. For richly populated 
families, this is not a problem, because the result- 
ing matrices will reflect the constituent sequence 
diversity; but for poorly populated families, the 
matrices may be too sparse and may not encode 
sufficient variation to be able to detect distant 
relatives reliably, if at all. 

One way to address this problem is to use mu- 
tation or substitution matrices to weight noniden- 
tical residue matches. Commonly used scoring 
matrices include the PAM [14] and BLOSUM 
series [15]. The former is based on the concept of 
the point accepted mutation (PAM). PAM 250 is 
often used as a default matrix in comparison pro- 
grams because it gives similarity scores equivalent 



.to 20% matches remaining between two 
sequences, the twilight zone [16] of similarity. 
The BLOSUM matrices, which are derived from 
observed substitutions in blocks of aligned 
sequences from the BLOCKS database, were 
designed to detect distant similarities more re- 
liably than the Dayhoff series, which can only 
infer remote relationships because their substi- 
tution rates were derived from sets of highly simi- 
lar sequences. Whatever the approach used, 
however, similarity matrices are inherently noisy 
because they indiscriminately weight both ran- 
dom matches and weak signals. Thus care should 
be taken to select a scoring matrix appropriate to 
the evolutionary distance at which relationships 
are being sought. For practical purposes, this 
means using a range of different matrices (though 
few people actually bother!). 

23. Profile methods 

An alternative philosophy to the motif-based 
approach of protein family characterisation 
adopts the principle that the variable regions 
between conserved motifs also contain valuable 
information. Here, the complete conserved por- 
tion of the alignment (including gaps) effectively 
becomes the discriminator. The discriminator, 
termed a profile, defines which residues are 
allowed at given positions, which positions are 
highly conserved and which degenerate, and 
which positions, or regions, can tolerate inser- 
tions. The scoring system is intricate and may 
include evolutionary weights and results from 
structural studies, as well as data implicit in the 
alignment. In addition, variable penalties may be 
specified to weight against insertions and del- 
etions occurring within core secondary structure 
elements [17,18]. Profiles provide a sensitive 
means of detecting distant sequence relationships 
where only very few residues are well-conserved. 

Just as there are different ways of using motifs 
to characterise protein families, so there are 
different ways of using domain alignments to 
build family discriminators. An extension of the 
concept of profiles lies in the application of hid- 
den Markov models (HMMs) [19]. These are 
probabilistic models consisting of a number of 
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Fig. 4. Linear hidden Markov model (HMM). Each position 
of an alignment is represented as a match (M), an insert (I), 
or a delete (D) state in the HMM. This allows a query 
sequence to be aligned by assigning the most probable state 
transition to each of its residues. 



interconnecting states — they are essentially lin- 
ear chains of match, delete or insert states that 
attempt to encode the sequence conservation 
within aligned families. A match state is assigned 
to each conserved column in a sequence align- 
ment; an insert state allows for insertions relative 
to the match states; and delete states allow pos- 
itions to be skipped. Probabilities or costs (nega- 
tive log probabilities) are associated with each 
omission and each transition between states. To 
align a sequence is to find the highest-probability 
(lowest-cost) path through the HMM. A linear 
HMM is depicted in Fig. 4. 

Although capable of providing precise descrip- 
tors for particular families, as with all methods, 
there are drawbacks. One problem arises from 
the specificity of profiles and HMMs. For 
example, they may be well trained for a given 
family, but an outlier that was not included in 
the training set may be missed if features of its 
sequence are incompatible with the model. 
Another problem relates to the automatic, itera- 
tive nature of HMM training; without adequate 
supervision, the process may include false posi- 
tive matches, which may ultimately corrupt the 
model and lead to profile dilution. 



3. Pattern databases 

As a consequence of the range of sources of 
sequence data and the variety of ways of analys- 
ing sequences and encoding protein families, a 



Table 3 

Some of the major pattern databases in common use; in each 
case, the primary source is noted, together with the type of 
pattern stored (e.g. regular expression, fingerprint, HMM, 



etc.) 


Pattern 


Data source 


Stored information 


database 






PROSITE 


SWISS-PROT 


regular expressions (patterns) 


PRINTS 


SWISS-PROT/ 


djigiicu iiiuuii ^iingerpnnis^ 




TrEMBL 




Profiles 


SWISS-PROT 


gapped weight matrices 






(profiles) 


Pfam 


SWISS-PROT/ 


gapped domain alignments 




TrEMBL 


(HMMs) 


BLOCKS 


PROSITE/ 


aligned motifs (blocks) 




PRINTS 




IDENTIFY BLOCKS/ 


permissive regular 




PRINTS 


expressions (patterns) 


number 


of different 


pattern databases have 


evolved 


to house the different descriptors out- 



lined in the previous section. The databases and 
their associated methods are summarised in 
Table 3. Despite their differences, pattern data- 
bases have arisen from a common principle: i.e. 
homologous sequences share conserved motifs, 
presumably crucial to the structure or function of 
the protein, which can be used to build discrimi- 
nators for particular protein families. An 
unknown query sequence may be searched 
against a library of such descriptors to determine 
whether or not it contains any of the predefined 
characteristics and hence whether or not it can 
be assigned to a knowji family. If the structure 
and function of the family is known, searches of 
pattern databases thus theoretically offer a fast 
track to the inference of biological function. 
Because these resources are derived from multiple 
sequence information, searches of them are often 
better able to identify distant relationships than 
are searches of the sequence databases. However, 
none of the pattern databases is yet complete. 
They should therefore be used to augment 
sequence searches, rather than to replace them. 
The status of some of the commonly used pattern 
resources is outlined below. 

PROSITE, the first pattern database to have 
been developed, houses motifs in the form of 
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ID BACTERIAL_0PSIN_1 ; PATTERN. 

AC PS00950; 

DT JUN-1994 (CREATED); JUN-1994 (DATA UPDATE) ; JUL- 1998 (INFO UPDATE) . 

DE Bacterial rhodopsins signature 1. 

PA R-Y-x- [DT] -W-x- [LIVMF] - (ST] -T-P- [LIVM] (3) . 

NR /RELEASE=36, 74019; 

NR /TOTAL=22(22) ; /POSITIVE=22 (22) ; /UNKNOWN=0 (0) ; /FALSE_POS=0 (0) ; 

NR / FALSE_NEG= 1 ; / PARTI AL= 1; 

CC /TAXO-RANGE=A????; /MAX-REPEAT=1 ; 

DR P19585, BAC1_HALS1, T; P29563, BAC2_HALS2, T; P96787, BAC3_HALSD, T 

DR Q48334, BAC 3 _HAL V A , T; P33970, BACH_HALHM , T; Q48315, BACH_HALHP , T 

DR Q48314, BACH_HALHS , T; P16102, BACH.HALSP, T; P33742, BACH_HAL S S t T, 

DR P94853, BACH_H ALVA , T; P15647, BACH_NATPH , T; Q57101, BACR_H ALAR , T, 

DR P02945, BACR_HALHA , T; P33969, BACR_HALHM, T; P33971, BACR__HALHP , T, 

DR P33972, BACR_HALHS, T; Q53496, BACR_HALSR , T; P94854, B ACR^HALVA , T, 

DR P25964, BAC S_HALHA , T; P33743, BACS_HALSS, T; P71411, BACT_HALS A » T 

DR P42196, BACT^NATPH, T; 

DR Q53461, BACH__HALAR, P; 

DR P42197, BACT_H ALVA , N; 

3D IBRD; 2BRD; 1BAC; 1BAD; 1BHA; 1BHB; 1BCT; 1SR1; 

DO PDOC00291; 

// 



Fig. 5. Example PROSITE entry, showing the data file for the bacteriorhodopsin pattern. When viewing PROSITE on the Web, 
accession numbers are hyperlinked, allowing direct access to the corresponding SWISS-PROT entry for each sequence matched. 
Similarly, the documentation file for a given pattern can be accessed via the hyperlinked PDOC accession number at the bottom of 
the file. 



regular expressions [20]. The process of deriving 
regular expressions first requires the construction 
of a multiple alignment and then location of the 
conserved regions. The most conserved segment 
is selected and its sequence information reduced 
to a consensus pattern, which is used to search 
SWISS-PROT [21]. Results are checked manually 
to determine how well the pattern has performed 
— ideally, there should be only true matches. 
Patterns whose diagnostic performance is com- 
promised by matching too many false positives 
are manually adjusted and SWISS-PROT is 
rescanned. The process of fine tuning is repeated 
until an optimal pattern is created. If a family 
cannot be fully characterised by a single motif, 
additional patterns are designed to encode other 
well-conserved parts of the alignment. The fine- 
tuning process is then repeated until a set of pat- 
terns is achieved that is capable of capturing all, 
or most, of the family without matching too 
many, or any, false positives. When the best pat- 
tern, or set of patterns, has been achieved, the 
results are manually annotated for inclusion in 
the database. 

Entries are deposited in PROSITE in two dis- 
tinct files: (i) a structured data file that houses 
the pattern and lists all matches in the parent 



version of SWISS-PROT, as shown in Fig. 5; (ii) 
a free-format documentation or annotation file, 
which provides details of the characterised family 
and, where known, a description of the biological 
role of the chosen motif/s and a supporting bibli- 
ography. A number of features of the data file 
are worthy of note. Apart from the identifier 
(ID) and description (DE) lines, which identify 
the characterised family, aspects of the DR and 
especially the NR lines are crucial to understand. 
The DR lines list all true (T), possible (P), false 
(F) and missed/negative (N) matches to the pat- 
tern, which results are summarised in the NR 
lines. In the example shown in Fig. 3, 22 matches 
are made to the pattern, all of which are true, 
one is possible (a fragment) and there is a single 
false negative match, BACT HALVA. Inspection 
of its sequence (e.g. by following its hyperlinked 
accession number, P42197, from this page on the 
Web) reveals that a disallowed asparagine in the 
penultimate position of the motif 
(RYVDWLLTTPLNV) is the reason for the mis- 
match. Referring back to the pattern line, we see 
that only members of the group [LIVM] are 
allowed in the last three positions of the motif 
(as denoted by [LIVM](3)). The quality of a pat- 
tern can thus immediately be ascertained from 
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the NR lines, which are therefore probably the 
most important lines to inspect when first view- 
ing a PROSITE entry. In some cases, there are 
numerous false positives and false negatives (es- 
pecially for large super-families with substantial 
numbers of divergent sequences, such as G-pro- 
tein-coupled receptors, lipocalins, etc.). Such pat- 
terns are diagnostically unreliable and are a 
limitation to the diagnostic potential of the data- 
base. PROSITE release 15 (July 1998), with 
updates to April 1999, contains 1014 entries 
characterised by 1352 patterns. The database is 
accessible for searching via the ExPASy Web ser- 
ver and is maintained collaboratively at the Swiss 
Institute of Bioinformatics. 

BLOCKS, one of the first multiple-motif data- 
bases, is based on families already identified in 
PROSITE [22]. Here, motifs are detected auto- 
matically, using first, a modification of an algor- 
ithm that initially locates three conserved amino 
acids [23] and second, a motif-finding algorithm 
that searches for the highest scoring set of blocks 
that occur in the correct order without overlap- 
ping. Blocks found by both methods are con- 
sidered reliable and are calibrated against 
SWISS-PROT to obtain a measure of the likeli- 
hood of a chance match. The calibrated blocks 
are then concatenated into the BLOCKS data- 
base. An indication of the diagnostic power of a 
block is given in terms of a strength value — 
strong blocks are more effective than weak 
blocks (strength less than 1100) at separating 
true positives from true negatives. In searching 
the database, however, more important than the 
strength of individual blocks is the number of 
blocks matched. High-scoring matches to individ- 
ual blocks seldom have biological significance; 
conversely, matches to sets of blocks from the 
same family are unlikely to have arisen by chance 
(provided they match in the correct order with 
appropriate distances between them) and a prob- 
ability value is calculated to reflect that likeli- 
hood. Release fl.O contains 4034 blocks, 
representing 994 groups from PROSITE 15. 

Recently, several other BLOCKS databases 
have been made available. For example, in 
BLOCKS + , supplementing the entries derived 
from PROSITE are blocks from families in 



PRINTS that are not already in BLOCKS and 
then successively, any additional blocks from 
Pfam, ProDom and DOMO. BLOCKS + is thus 
comprehensive, containing 9498 blocks from 
2129 sequence groups. Complementing this 
resource is a version of PRINTS in which block- 
scoring methods have been exploited [22]. 
PRINTS' motifs tend to be deeper than those in 
BLOCKS because its source database is larger; 
the diagnostic performance of entries in the two 
resources can therefore differ, BLOCKS-format- 
PRINTS tending to be more prone to problems 
of noise. Because the BLOCKS databases are de- 
rived automatically, their entries are not anno- 
tated, but links are made to the corresponding 
PROSITE and PRINTS documentation files. The 
databases are accessible for searching via the 
Web server at the Fred Hutchinson Cancer 
Research Center in Seattle. 

PRINTS, another of the early responses to the 
diagnostic limitations of regular expression 
matching, is based on the method of fingerprint- 
ing [24]. This approach uses groups of conserved 
motifs to build diagnostic signatures of family 
membership. The process involves manual cre- 
ation of a seed alignment, and location and exci- 
sion of conserved motifs for searching SWISS- 
PROT and TrEMBL. Results are examined to 
determine which sequences have matched all the 
motifs in the fingerprint; if there are more 
matches than were in the initial alignment, the 
additional information from these new sequences 
is added to the motifs and the database is 
searched again. This iterative process is repeated 
until no further complete fingerprint matches can 
be identified. The results are then annotated 
manually (with descriptions of the family, details 
of the structural or functional relevance of the 
motifs where known, cross-references to related 
databases, bibliographic references, etc.) prior to 
inclusion in the database. 

Fingerprint diagnostic performance is indicated 
via a summary that lists how many sequences 
matched all the motifs and how many made only 
partial matches (i.e. failed to match one or more 
motifs). The fewer the partial matches, the better 
the fingerprint. The full potency of the method 
derives from the mutual context provided by 
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motif neighbours. The more motifs in a finger- 
print, the better able it is to identify distant rela- 
tives, even when parts of the signature are 
absent; conversely, the fewer the motifs, the 
poorer the diagnostic performance. Fingerprints 
with only two motifs are diagnostically little bet- 
ter than single motifs and are therefore more 
likely to make false positive matches. When 
searching PRINTS, probability and expect values 
are calculated to assign a measure of confidence 
to both complete and partial matches. 

Within PRINTS, motifs are encoded as un- 
gapped, un-weighted local alignments. An im- 
portant consequence of storing the motifs in this 
'raw' form is that, unlike with regular expressions 
or other abstractions, no sequence information is 
lost. Different scoring methods may thus be 
superposed onto the motifs, conferring different 
scoring potentials, and hence different perspec- 
tives, on the same data. PRINTS may therefore 
provide the raw material for other pattern data- 
bases. PRINTS release 23.0 (June 1999) contains 
1160 entries (6938 motifs), currently making it 
the most comprehensive manually annotated pat- 
tern database. The database is accessible for 
searching via the Web server in the School of 
Biological Sciences at the University of 
Manchester. 

IDENTIFY is derived automatically from 
motifs in BLOCKS and PRINTS [12]. The pro- 
gram used to create the database constructs con- 
sensus expressions from the motifs, adopting a 
permissive approach in which different residues 
are tolerated according to a set of prescribed 
groupings (Table 2). These groups correspond to 
various biochemical properties, theoretically 
ensuring that the resulting expressions have sensi- 
ble biochemical interpretations. However, as 
mentioned earlier, in practice this approach may 
lead to an increase in noise. When searching the 
resource, different levels of stringency are there- 
fore offered from which to infer the significance 
of matches, rendering the approach diagnostically 
more powerful than exact pattern matching 
(which only offers match/no-match diagnoses). 
IDENTIFY is accessible from the Web server in 
the Department of Biochemistry at the 
University of Stanford. 



Profiles are discriminators distilled from 
sequence information in complete domain align- 
ments. As a result of their potency, they are used 
to complement some of the poorer regular ex- 
pressions in PROSITE, or to provide a diagnos- 
tic alternative where extreme sequence divergence 
renders the use of regular expressions inappropri- 
ate. A compendium of profiles has been created 
at the Swiss Institute for Experimental Cancer 
Research (ISREC) in Lausanne. Each profile has 
separate PROSITE-compatible data and docu- 
mentation files. This allows results that have 
been validated and annotated to an appropriate 
standard to be made available as an integral part 
of PROSITE [20]. As before, diagnostic perform- 
ance can be ascertained from the DR and NR 
lines. Profiles are less prone to make false 
matches than are regular expressions, but the 
numbers released via PROSITE are only small 
(48 in July 1998). Nevertheless, profiles that have 
not yet achieved the necessary standard of vali- 
dation and annotation (241 to date) are available 
for searching via ISREC's Web server. 

Pfam is a collection of HMMs for a range of 
protein domains [25]. The resource is based on 
two distinct classes of alignment: hand-edited 
seed alignments, which are deemed to be accu- 
rate; and an automatically clustered set derived 
from ProDom families. The seed alignments are 
used to build HMMs, to which sequences are 
automatically aligned to generate final full align- 
ments. If the initial alignments do not produce 
diagnostically sound HMMs, the seed is 
improved and the gathering process iterated until 
a good result is achieved. The methods that ulti- 
mately generate the best full alignment may vary 
for different families, so the parameters are saved 
to allow results to be reproduced. The collection 
of seed and full alignments, coupled with mini- 
mal annotations (often no more than a descrip- 
tion line), related database and literature cross- 
references and the HMMs themselves, constitute 
Pfam-A. All sequence domains that are not 
included in Pfam-A are automatically clustered 
and deposited in Pfam-B. Although the methods 
and parameters used to create the full automatic 
alignment are noted, no indication is given of the 
diagnostic performance of a given HMM. Direct 
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Table 4 

Some of the major alignment databases; in each case, the pri- 
mary source is noted, together with the level of information 
stored (i.e. whether domain, family or super- family align- 
ments) 



Alignment 
database 


Primary 
source 


Stored information 


ProDom 


swiss- 


domains 




PROT 




CDACC 
ODAOC 




domains 




PROT 




ProtoMap 


SWISS- 


families 




PROT 




PIR-ALN 


PIR 


super-families, families 






and domains 


PROT-FAM 


PIR 


super-families, families 






and domains 


ProClass 


SWISS- 


super-families, families 




PROT/PIR 


and domains 


DOMO 


SWISS- 


domains and repeats 




PROT/PIR 




PIMA 


Entrez 


domains 



visualisation of the final alignment is therefore 
probably the best indicator of how sound its 
HMM is likely to be. Pfam is accessible for 
searching via the Sanger Centre Web server; 
release 4.1 (July 1999) encodes 1488 domains. 



4. Alignment and family-related databases 

In addition to the range of pattern resources 
described above, several alignment databases are 
also available for searching via the Web. The 
construction of alignment and pattern databases 
is based on different principles, so the two types 
of resource should not be confused. The main 
difference between them is that alignment data- 
bases tend to be derived simply by automatic 
clustering of sequence databases. This allows 
them to be more comprehensive than pattern 
resources, because they do not depend on manual 
crafting of family discriminators. However, 
searches of alignment databases are often less 
sensitive because they are usually based on im- 
plementations of BLAST. Some well-known 
alignment resources are listed in Table 4. 

ProDom is an automatic compilation of 'hom- 



ologous' domains [26] created via a procedure 
based on PSI-BLAST [7]. Version 99.1 contains 
44,345 domains with at least 2 sequences, of 
which 2652 are linked to the Protein Data Bank 
(PDB) [27]. A recent addition to the resource is 
ProDom-CG, a compendium of domains built 
from complete genome data. The database is 
accessible for interrogation with the Sequence 
Retrieval System (SRS) [28] and for BLAST 
searching via the Web server of the Institut 
National de la Recherche Agronomique. 
Emphasis has been placed on the graphical user 
interface, which facilitates analysis of protein re- 
lationships. However, being automatically de- 
rived, no annotations or validations are provided 
and although links are made to the PDB for 
~5% of entries, these are generic links from the 
constituent sequences rather than from the 
domains themselves. Discovering the biological 
meaning of domains can thus be difficult, invol- 
ving extensive cross-checking with other 
resources. 

SBASE is a library of domain sequences de- 
rived from structural and functional segments 
annotated in SWISS-PROT, PIR or the literature 
[29]. Entries are grouped on the basis of standard 
names and further classified on the basis of 
BLAST similarity. The resource, which was 
developed to assist domain recognition, is main- 
tained collaboratively by the International Center 
for Genetic Engineering and Biotechnology 
(ICGEB), Trieste, Italy and the ABC Institute 
for Biochemistry and Protein Research, Godollo, 
Hungary. SBASE is accessible for BLAST 
searching via the ICGEB Web server; version 6.0 
(October 1998) contains 1038 groups. 

ProtoMap classifies sequences in SWISS-PROT 
into groups of related proteins [30]. Clustering is 
effected at different levels of confidence, resulting 
in a hierarchical organisation that divides the 
sequences into well-defined groups, which mostly 
correlate with biological families and superfami- 
lies. The resource was designed to help reveal re- 
lationships between families and to facilitate the 
detection of sub-families. ProtoMap release 2.0 
(July 1998) provides a classification of 72,623 
sequences. The resource is accessible for search- 
ing via the Hebrew University Web server. 
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PIR-ALN is a database of annotated protein 
sequence alignments derived automatically from 
the PIR-International PSD at the National 
Biomedical Research Foundation in Washington 
[31]. The database includes alignments at super- 
family, family and so-called homology domain 
levels. Sequences are grouped in the same super- 
family if they are similar from end to end; super- 
families are further subdivided into families con- 
taining sequences that are 45% identical; and 
segments corresponding to the same domain in 
two or more super-families are the basis of 
domain alignments. All domain alignments are 
deposited in the DOMAINDB database, which is 
used to screen new sequences for already defined 
domains. The March 1999 release of PIR-ALN 
contains 3983 alignments, including 1480 super- 
family and 371 domain alignments. The resource 
can be queried with the ATLAS information 
retrieval system at the PIR Web site. 

PROT-FAM is based on an automatic cluster- 
ing of the PIR-International PSD at the Munich 
Information Center for Protein Sequences 
(MIPS) [32]. Sequences that share 50% identity 
are clustered into families, and families are 
further clustered into super-families if they share 
~30% identity. Sequences are assigned to the 
same family if they are similar from N- to C-ter- 
minus, while regions showing ~30% identity that 
do not cover the full sequence length are anno- 
tated as domains. Domains are deposited into 
the HOMDOM database, which is used to 
characterise new sequences by means of the pre- 
defined domains. For all families, super-families 
and domains that contain more than one 
sequence, alignments are created using PILEUP 
[33]. The September 1998 release of PROT-FAM 
included 6000 families with two sequences and 
~6500 families containing three or more; ~3800 
super-families derived from more than one 
family; and 361 domains. These are available for 
querying via the MIPS Web site. 

ProClass is a value-added database built upon 
the PIR-International PSD, PROSITE and 
SWISS-PROT [34]. It organises nonredundant 
SWISS-PROT and PIR sequences according to 
relationships defined collectively by PIR super- 
families and PROSITE patterns. By combining 



global similarities and motifs into a single classifi- 
cation scheme, ProClass was designed to facili- 
tate identification of domain and family 
relationships, and classification of multidomain 
proteins. ProClass release 4.0 (September 1998) 
contains 122,253 sequence entries, ~60% of 
which are classified into ~3500 families. The 
resource is available for searching from the PIR 
Web server. 

DOMO is a database of 'homologous' domain 
alignments computed automatically from a non- 
redundant amalgam of SWISS-PROT and PIR 
[35]. The domains have been compiled in FASTA 
format to permit fast searching using BLAST 
and sequence alignment using CLUSTALW [36]. 
The resource was designed as an aid to determine 
domain arrangements, their evolutionary re- 
lationships and their key conserved amino acids. 
DOMO can be queried via SRS at the 
Infobiogen Web site. Release 1.2 (April 1998) 
contains 99,058 domains clustered into 8877 
sequence alignments. Query results are linked to 
other databases to provide complementary infor- 
mation on related proteins and their families. 
Where 3D structures of representative sequences 
are known, links to the atomic coordinates and 
structure classification resources are provided. If 
the domain structure is unknown, pointers are 
given to a composite secondary structure predic- 
tion obtained from a variety of different tech- 
niques. As with other automatically generated 
resources, the structure links are generic and do 
not relate directly to the domains themselves; 
understanding their biological significance can 
therefore be difficult. 

PIMA is a collection of conserved motifs gen- 
erated by clustering the NCBFs Entrez database 
[37], For families of two or more sequences, 
alignments are created using the pattern-induced 
multiple alignment program [38] and these are 
scanned for the presence of conserved regions. If 
an alignment contains one or more such el- 
ements, additional alignments are created by 
excision of these conserved segments. Currently, 
the PIMA database includes 22,416 alignments, 
each of which contributes a single pattern to the 
resource; it is available for searching with modi- 
fied versions of FASTA via the Baylor College of 
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Medicine Search Launcher Web pages. Here, 
another database has been created by extracting 
the locations of ail annotated domains and sites 
from sequences contained in the Entrez, 
PROSITE, BLOCKS and PRINTS databases. 
The BEAUTY utility incorporates this infor- 
mation directly into BLAST search results [39]; 
for each match, a schematic display allows direct 
comparison of the locations of conserved regions. 



5. Which database is best? 

The plethora of available databases presents 
bewildering choices to the would-be sequence 
analyst. Which is diagnostically most reliable? 
Which has the most useful annotations? Which is 
the most comprehensive? Which should I use? At 
first sight, the alignment resources appear to be 
the most comprehensive. But they are largely 
based on automatic clustering of sequence data- 
bases and their search tools thus tend to involve 
flavours of BLAST or FASTA, which are less 
sensitive than searches of family-specific patterns. 
It is difficult to assess the quality of particular 
resources and it would be invidious to try. Each 
has different diagnostic strengths and weaknesses, 
each offers different family coverage and different 
levels of annotation — each has certain merits 
and demerits. Nevertheless, some general points 
bear consideration. 

Automatically generated databases carry no 
annotations. The advantage of searching them is 
that they are more comprehensive than their 
manually derived counterparts. The disadvantage 
is that there may be no way to ascertain the bio- 
logical significance of a match, if indeed it has 
any (that a match has been made does not mean 
an evolutionary relationship necessarily exists). 
This is important to understand in light of 
resources that house 'homology domains' — auto- 
matic methods detect similarities, but it is for the 
user to infer homology from supporting biologi- 
cal evidence. Related issues arise in resources 
that calculate evolutionary trees from their auto- 
matically created alignments; if levels of strin- 
gency are sufficiently high, alignments and their 
trees may be sound; but at low stringency, results 



are likely to be error prone and relationships 
should be inferred with caution. 

Amongst pattern databases, single-motif 
methods that rely on exact regular expression 
pattern-matching have diagnostic limitations; 
such methods tolerate no similarity, so will fail 
to diagnose sequences that contain subtle changes 
not catered for by the pattern. Moreover, single 
motifs offer no biological context within which to 
assess the significance of a match. Multiple-motif 
approaches inherently offer improved diagnostic 
reliability by virtue of the mutual context pro- 
vided by motif neighbours. Thus, if a query fails 
to match all the motifs in a signature, the pattern 
of matches formed by the remaining motifs still 
allows the user to make a confident diagnosis. 

Pattern resources derived from existing data- 
bases have the limitation that they offer no 
further family coverage. Nevertheless, they have 
the advantage of implementing different analyti- 
cal methods from their source databases, thus 
offering different scoring potentials on the same 
data and furnishing important opportunities to 
diagnose relationships missed by the original im- 
plementations. 

Finally, manually annotated databases are set 
apart from their automatically created counter- 
parts by virtue of (i) providing validation of 
results and (ii) offering detailed information that 
helps to place conserved sequence information in 
structural or functional contexts. This is vital for 
the user, who not only wants to discover whether 
a sequence has matched a predefined motif, but 
also needs to understand its biological signifi- 
cance. 



6. Composite pattern databases 

If, today, comprehensive sequence analysis 
requires accessing a variety of disparate data- 
bases, gathering the range of different outputs 
and arriving at some sort of consensus view of 
the results, in the future this process should 
become more straightforward. The curators of 
PROSITE, Profiles, PRINTS, Pfam and ProDom 
are currently creating a unified database of pro- 
tein families, termed InterPro. The aim is to pro- 
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vide a single family annotation resource, based 
on existing documentations in PROSITE and 
PRINTS and on the minimal annotations pro- 
vided in Pfam. Each InterPro family will link to 
different entries in its satellite pattern databases. 
This will simplify sequence analysis for the user, 
who will thereby have access to a central resource 
for protein family diagnosis. 

This effort is supported by the curators of the 
BLOCKS databases, who, realising the problems 
associated with providing detailed family docu- 
mentation, are developing a dedicated protein 
family Web site, termed pro Web [40]. This facil- 
ity provides information about individual families 
via links to existing Web resources maintained by 
researchers in their own fields. Pro Web can facili- 
tate the task of annotators by providing con- 
venient access to family information and 
obviating the need for annotators themselves to 
become 'expert' on all proteins. 

7. Conclusion 

Creating and searching pattern databases are 
activities that lie at different ends of a fallible 
chain of events. We begin with a sequence align- 
ment, we create some kind of scoring function to 
encode the conservation within the alignment (a 
scoring matrix, HMM, etc.), we store the discri- 
minators in a database and we search them with 
different algorithms. Problems arise if unrelated 
sequences have crept into the alignment, which in 
turn lead to errors in the discriminators, which 
then give ambiguous or incorrect search results. 
Alternatively, the discriminators may be sound, 
but the search algorithms may not be sufficiently 
sensitive to allow unequivocal diagnosis, leading 
the user to false conclusions of family ties. If the 
user has performed this experiment on a newly 
determined sequence and submits the results to 
one of the sequence databases, the annotation 
error becomes available for mass propagation. 

Recently, there has been doom-mongering in 
the literature about the quality of our databases, 
some harbingers of misfortune predicting a future 
error catastrophe. At the same time, claims of 
success for some approaches to family classifi- 



cation and function prediction have been equally 
overdone. A more balanced view recognises that 
our databases and search routines are not per- 
fect, but with the right approach we can avoid 
the pitfalls of jumping to over-pessimistic or 
over-zealous conclusions. 

Until we have sufficient experimental data 
available, pattern and sequence databases are 
probably the best tools we have for accessing the 
functional and evolutionary clues latent in the 
sequences flooding from the genome projects. 
Pattern databases offer several benefits: (i) by dis- 
tilling multiple sequence information into family 
descriptors, trivial errors in the underlying 
sequences may be diluted; (ii) annotation errors 
may be quickly spotted if the description of one 
sequence differs from that of its family; and (iii) 
they allow specific diagnoses, placing individual 
sequences in a family context for a more 
informed assessment of possible function. By 
contrast, searches of sequence databases tend to 
reveal only generic similarities, making precise 
pinpointing of a particular biological niche more 
difficult. 

While there is some overlap between them, the 
contents of the pattern databases differ. Together 
they encode ~2000 families, including globular 
and membrane proteins, modular polypeptides 
and so on. It has been estimated that the total 
number of families might be in the range 1000 to 
10,000, so there is a long way to go before any 
of the databases can be considered complete. 
Thus, in building a search strategy, it is good 
practice to include all available pattern resources, 
to ensure that the analysis is as comprehensive as 
possible and that it takes advantage of a variety 
of search methods. Where there is consensus, 
diagnoses can be made with greater confidence. 

Unfortunately, creating and annotating family 
descriptors is time-consuming, so pattern data- 
bases have not kept pace with the deluge of 
sequence data. Consequently, by comparison 
with the sequence repositories, they are still very 
small. Nevertheless, as they become more com- 
prehensive, as the volume of sequence data 
expands and search outputs become more com- 
plex, their diagnostic potency ensures that pat- 
tern databases will play an increasingly 
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important role as the post-genome quest to 
assign functional information to raw sequence 
data gains pace. 
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atrophy), and it is likely that others have 
yet to be found. A search of the working 
draft sequence yielded 286 potential 
paralogs of the 97 1 known human 
disease genes with entries in OMIM and 
either SwissProt orTREMBL protein 
databases. A similar screen of 603 classic 
drug target proteins identified 1 8 new 
potential paralogs. Together, these 
groups offer an intriguing collection of 
candidates - genes that might cause 
related disorders when mutated or that 
might encode new targets for drug 
screens. 

Our understanding of disease 
mechanisms might also lead to the 
identification of new therapeutic targets. 
Profiling gene-expression changes in 
biological systems that model disease 
might lead to the identification of 
pathways that play a crucial role in 
pathogenesis. Such an endeavor is 
currently under way for the polyglutamine 
expansion diseases. Furthermore, 
consistent genetic changes that occur in 
easily accessible tissues in model 
organisms might provide surrogate 
markers for drug screens. In addition, an 
understanding of the common 
polymorphisms that occur in drug target 
proteins might help predict which patients 
will respond appropriately to therapy. 

What lies ahead? 

With a bounty of information being served 
up, it is important to keep in mind both the 
many strengths and the limitations of the 



current data set The working draft of the 
human genome that is accessible in publicly 
available databases includes almost one 
billion base pairs of finished sequence. 
However, nearly 75% of BACs are 
unfinished, currently consisting of as many 
as 1 0-20 unassembled sequence fragments 
each. Unfinished, unassembled sequence 
presents difficulties during gene mapping, 
might contain contamination and might be 
inadvertently assembled to create artificial 
duplications or deletions. Over the coming 
year, it is hoped that full coverage (8-10-fold 
redundancy) will be achieved for clones 
spanning the entire physical map, followed 
shortly thereafter by finished sequence. At 
that point, >96% of the euchromatic human 
genome will be in the database. Closing the 
remaining gaps, which might contain 
biologically important information, will 
require screening additional large-insert 
libraries, a process that is anticipated to take 
until 2003. Finally new techniques might be 
needed to close recalcitrant gaps and to 
generate sequence from heterochromatic 
regions that probably contain highly 
polymorphic tandem repeats. 

While we eagerly await completion of 
the finished genome sequence, our ability 
to mine the information we seek is rapidly 
evolving. Gene prediction, or annotation, 
is much more difficult in humans than in 
the fly. worm or yeast as a result of the 
large size of the genome. New computer 
algorithms and high-throughput 
techniques for gene identification and 
verification will be needed. Comparisons 



with the genomes of other vertebrates will 
probably speed up this process and might 
reveal conserved regulatory regions that 
v control the expression of orthologous 
genes. The Mammalian Gene Collection 
Project aims to assemble a comprehensive 
collection of full-length human cDNAs, 
providing a valuable resource for those 
studying gene function. Furthermore, by 
extending the SNP data set to include all 
common variants, the identification of 
disease genes and genetic modifiers 
should be greatly facilitated. This hope - 
of using the genome to help define causes 
and cures for human disease - underlies 
much of the excitement surrounding the 
release of the working draft. 
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I Techniques & Applications 

A compendium of specific motifs for diagnosing GPCR 
subtypes 

Teresa K. Attwood 



Analysis of G-protein-coupled receptor 
(GPCR) subtypes has attracted considerable 
interest because some drugs that act on 
GPCRs cause therapeutic problems as a 
result of their failure to differentiate 
between subtypes. In this article, an 
extensive compendium of diagnostic 
'fingerprints' for GPCR subtypes and their 
families will be described. These 
fingerprints offer new opportunities to 
investigate correlations between specific 
sequence motifs and ligand binding or 



G-protein coupling, and are likely to prove 
valuable both in seeking novel receptors in 
genome data and in the characterization of 
orphan receptors. 

G-protein-coupled receptors (GPCRs) 
constitute a vast group of cell-surface 
proteins that includes hormone, 
neurotransmitter, growth factor, light and 
odorant receptors. Approximately 2000 
members populate -50 families within the 
rhodopsm-likesuperfamily, accounting for 



- 1% of the vertebrate genome 1 . With so 
many GPCRs known, and perhaps 
hundreds awaiting discovery in the 
human genome, these receptors are of 
interest to the pharmaceutical industry 
because of the opportunities they afford 
for yielding novel drug targets 1-4 . 

More than 50% of prescription drugs act 
on GPCRs; however, some have efficacy 
problems and limiting side-effects because 
the compounds do not differentiate 
between receptor subtypes. There is thus 
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Box 1. Identification of GPCRs using pattern databases 



Protein pattern databases are becoming 
increasingly valuable as diagnostic 
resources that complement the ubiquitous 
sequence similarity search tool BLAST. 
Pattern databases house characteristic 
family signatures, which are encoded in 
different ways within the different 
resources: some encode single motifs (e.g. 
PROSITE patterns); others use groups of 
motifs in the form of fingerprints (e.g. 
PRINTS); and others encode virtually the 
full family alignment (e.g. PROSITE 
profiles and Pfam). Because the 
underlying analysis methods are different, 
inevitably the databases have different 
diagnostic strengths and weaknesses. It is 
therefore instructive to compare the 
results of searching a range of these 
resources using the same query sequence. 
A convenient way of doing this is to use 
the InterPro interface at 
http://www.ebi.ac.uk/interpro/scan.html. 
The graphical output in Fig. I shows the 
result of searching PROSITE patterns, 
PROSITE profiles, Pfam and PRINTS with 
the human muscarinic acetylcholine 
receptor. ACM1.HUMAN. 

As shown, PROSITE patterns encode 
only single short motifs (yellow), whereas 
PROSITE profiles (orange) and Pfam (blue) 
utilize almost the complete sequence. By 
contrast, PRINTS fingerprints (green) 
encode groups of motifs that differentiate 
between regions of sequence that 
characterize the superfamily (sf) and those 



that characterize the family (0 and receptor 
subtype (st). Thus, it is evident from the 
comparison that although PROSITE 
patterns, PROSITE profiles and Pfam only 
furnish superfamily diagnoses, PRINTS 
provides a more fine-grained result. The 
detail conferred by a fingerprint match 
lends PRINTS a significant part of its 
diagnostic power. Using PRINTS, we can 
see immediately that the superfamily 
fingerprint encodes seven motifs 
[hyperlinks to the database confirm that 
these are the transmembrane (TM) 
domains], whereas the family and receptor 
subtype fingerprints comprise different 
parts of the terminal, loop and TM regions. 
The mutual context of motif neighbours 
within a fingerprint offers a unique 
diagnostic advantage. By contrast with the 
'pin-point' matches of PROSITE patterns 
and the 'blanket' matches of PROSITE 



Fig. I 

PROSITE pattern (sf ) . 
PROSITE profile (sf ) • 
PRINTS (sf) 
Pfam (sf) 
PRINTS (f) 
PRINTS (St) 



profiles and Pfam, PRINTS motifs explicitly 
capture, and map, functionally and 
structurally important biological features. 
This is valuable for several reasons: (1) in 
analyses of uncharacterized genome data, 
fingerprints are not limited to superfamily- 
level diagnoses, but provide sufficient 
depth to be able to pinpoint particular 
receptor subtypes, thereby facilitating the 
identification of novel receptors (M.D.R. 
Croning and T.K. Attwood, unpublished); 
(2) by storing motifs that differentiate 
between families and between receptor 
subtypes, correlations with specific 
residues involved in ligand-binding and 
G-protein coupling can be investigated; 
and hence (3) such fine-tuning, and the 
explicit encoding of motifs involved in 
ligand-binding, yields greater promise for 
our future ability to characterize orphan 
receptors. 

ACMl HUMAN 
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considerable interest in attaining 
therapeutic selectivity by identifying the 
single receptor subtype that affects a 
particular physiology. The goal is to be able 
to design drugs without, or at least with 
less, side-effects, while retaining the 
desired function. Muscarinic agonists, for 
example, gained attention in research into 
Alzheimer's disease following the 
realization that the cardiovascular and 
gastrointestinal side-effects of 
nonselective muscarinic agonists could be 
avoided (i.e. muscarinic acetylcholine Mj 
receptors in the brain might be involved in 
cognition, whereas other muscarinic 
receptor subtypes regulate heart and 
gastrointestinal functions 5 ). 

Identification of GPCRs 

Routinely, computational strategies for 
identifying GPCRs tend to involve 



searches of sequence databases [e.g. using 
standard tools such as BLAST (Ref. 6)] 
and sometimes also of so-called pattern' 
databases, which house diagnostic protein 
family 'signatures' (Box 1). However, it is 
apparent that BLAST 'sees' similarity 
between pairs of sequences in a rather 
limited way: it reveals generic similarities 
(e.g. it can show that the sequences being 
compared share several hydrophobic 
regions) but it cannot recognize individual 
family traits 7 (i.e. it cannot distinguish the 
differences between the sequences, such 
as specific ligand-binding motifs) . 
Similarly, most pattern databases tend to 
provide generic signatures that are only 
capable of diagnosing superfamily 
relationships. Thus, these databases 
might recognize that a sequence belongs 
to the rhodopsin-like GPCR superfamily, 
but they cannot offer insights into the 



particular family to which it belongs. For 
researchers interested in. for example, the 
treatment of obesity and wishing 
specifically to identify type 4 melanocortin 
receptors (which are important in 
regulating appetite and body weight) , a 
superfamily-level diagnosis is of limited 
value. Therefore, it seemed that it might 
be advantageous to develop a more fine- 
grained analytical approach for detecting 
GPCRs. 

Identification of specific receptor subtypes 

To facilitate the identification of 
particular subtypes, a systematic 
analysis of GPCRs was undertaken. 
Sequence alignments were created 
manually 8 for each of the different 
superfamilies and for their families and 
receptor subtypes. Regions of similarity 
and differences between alignments 
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(b) 






GPCRRHODOPSN ) 

MUSCARINICR 

MUSCRINICM1R -H- 
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Fig. 1. (a) Hierarchical diagnosis returned from a PRINTS fingerprint search with the human muscarinic acetylcholine 
M, receptor. ACM1 .HUMAN (the search is effected simply by pasting the full sequence, its identifier or its accession 
number into the Web form at http^/bioinf.man.acuWcgi-bir^dbbrov^r/nngerPRl^Scaii/muppet/FPScan.cgi). The 
result shows that three fingerprints have been matched, indicating that the sequence is likely to be a member of the 
rhodopsin-like G-protein-coupIed receptor (GPCR) superfamily (fingerprint GPCRRHODOPSN). belonging to the 
muscarinic receptor family (MUSCARINICR) and being specifically an M 1 receptor subtype (MUSCRINICM1R). The 
E- values in the centre of the table provide the measure of confidence in the results (E -values indicate the number of 
matches one would expect to see by chance: the smaller the number, the more likely the matches are to be 
biologically meaningful). Here, the results are all statistically significant (i.e. above the threshold value of irr*). 
(b) From left to right, the results in (a) are mapped in three dimensions onto a crude model and are illustrated 
schematically below. The coloured bars denote the relative locations and lengths of the constituent motifs within 
each fingerprint. The different regions that characterize the receptors at each level are clearly evident motifs in the 
superfamily fingerprint encode each of the seven TM domains; those in the family fingerprint encode parts ofTM and 
loop regions (here. TM domains 1 , 3, 4. 5 and 7, the second cytoplasmic, and second and third external loops), the 
motifs mostly clustering around the ligand-binding domain; and motifs in the subtype fingerprint are drawn from the 
third cytoplasmic loop and the N- and C -terminal domains (not shown in 3D), areas known to be involved in 
regulating the selectivity and intensity of G-protein coupling 1 . 



were then located and used to build a 
range of discriminatory *fingerprints\ 
Fingerprints are groups of conserved 
motifs that together provide a signature 
of family membership {motifs tend to 
reflect functionally or structurally 
important regions within a protein 
family [e.g. transmembrane (TM) 
domains, protein-protein interaction 
sites, ligand-binding sites, and so on], 
thereby characterizing the families in 
which they are found}. For the purposes 



of this analysis, within superfamilies the 
motifs encoded the only features 
common to all members (i.e. the scaffold 
of seven TM domains) 910 . Conversely, at 
the family level, the motifs focused on 
those regions that characterized the 
particular family, but distinguished it 
from the parent superfamily; 
predictably, these were usually small 
parts of TM and loop regions. For 
receptor subtypes, the distinguishing 
traits were largely present in the N- and 



C-terminal regions, and in the third 
cytoplasmic loop. 

To date, >200 GPCR-specific 
fingerprints have been created and made 
available as an integral component of the 
PRINTS Fingerprint database 11 
(http://www,bioirif.man.ac.uk/dbbrowser/ 
PRI^rTS/p^intscontents.htrnl#Receptors) . 
By searching PRINTS with a given query, 
it is thus possible to make a hierarchical 
diagnosis, indicating to which superfamily 
and family the sequence belongs and 
which subtype it most resembles, as 
illustrated for the human M ( receptor in 
Fig. la. 

Biological significance of receptor motifs 

To gain a deeper insight into the biological 
relevance of these database matches, the 
results can be rationalized in three 
dimensions by mapping the constituent 
motifs of the different fingerprints onto a 
crude model 12 . For these purposes, an old 
model based on the structure of 
bacteriorhodopsin 13 was used. Knowing 
that this was unlikely accurately to 
represent a GPCR (Ref. 1 4) . our aim was 
simply to help visualize the relative 
three-dimensional (3D) locations of the 
motifs, rather than to ascertain precise 
atomic positions. As shown in Fig. lb, the 
superfamily fingerprint encodes the 7TM 
scaffold, providing the architectural blue- 
print for all members; the family 
fingerprint focuses on the loop regions 
and on specific portions of the TM 
domains; and the subtype fingerprint is 
drawn from the third cytoplasmic loop 
and the N- and C-terminal domains. This 
is consistent with our expectation that 
portions of the TM segments are likely to 
constitute the ligand-binding domain, 
whereas the large intracellular region, 
unique to each subtype, is likely to 
constitute the effector-coupling domain 15 . 

Similar results can be visualized for 
all the GPCR families housed in PRINTS, 
either using the fingerPRINTScan suite 16 
(Fig. la) or the BLAST PRINTS server 17 , 
both of which are accessible from the 
PRINTS home page (http:// 
www.bioinf.man.ac.uk/dbbnwser/PRINTS) . 
Alternatively, a powerful new resource 
that allows comparison of results from 
searches of PRINTS. PROSITE (Ref. 18) 
and Pfam 19 is the integrated database of 
protein families, domains and functional 
sites known as InterPro (Ref. 20). By 
means of the graphical output from 
InterPro's sequence search, it is possible 
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to place the fingerprint matches in 
context and see at a glance which regions 
of a sequence are matched by the 
different resources. The example 
discussed in Box 1 demonstrates the fine- 
tuning that fingerprints add to the 
diagnostic process. 

Concluding remarks 

GPCR fingerprints allow specific 
diagnoses, from the level of the superfamily 
down to the individual receptor subtype. No 
other computational approach currently 
offers such a hierarchical discriminatory 
system for this important class of receptors. 
The resource is thus a valuable 
complement to family and domain 
databases such as PROSITE and Pfam, 
offering potent diagnostic opportunities 
that have not been realised by other 
pattern-recognition methods. 
Furthermore, fingerprint selectivity offers 
new opportunities to explore in more detail 
correlations between specific motifs and 
ligand binding or G-protein coupling. With 
the availability of the first draft of the 
human genome, this collection of diagnostic 
GPCR fingerprints promises to find 
application in computational strategies to 
identify potential new drug targets and to 
characterize orphan receptors. 
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P-val FingerPRINTScan submission page 



Page 1 of 1 



P-val FPScan 



Scan PRINTS with a PROTEIN query sequence; using an ID code from one 
of the following databases: {SWISSPROT SPTREMBL SWISSNEW 

TREMBLNEW} or by pasting it in as a raw sequence. 
Please Note; DNA Sequences are NOT catered for in this software. 

Important information concerning the E-value calculation please read 



Please input; either an ID code, or a raw sequence: 



mgfnltlaklpnnelhgqeshnsgnrsdgpgknttlhneS 
dtivi4pvlyli i fvas illnglavwi ffhirnkts fi fy| 
kn i vvadli mtlt fp fr i vh dag fg pw y fkfi lcr yt s v 
fyanmytsivflglisidrylkvvkpfgdsrmysitft^ 
lsvcvwvimavlslpniiltngqptednihdcsklkspl 
vkwhtavtyvnsclfvavlviligcyiaisryihkssrc 
isqssrkrkhnqsirvvvavyftcflpyhlcrmpstfshpj 



The E-value 
threshold 
determines the 

level of 
significance of 
results in the 1st 
table 

E -value threshold : 
0.0001 



Select Database 
^Prints32_0 ° Prints30_0 0 Blocksplusl I 
C; Prints3l 0 c Blocks! 1 



Select 
Matrix 

® blos62 

c bios45 

0 blos80 



Distance variance: 



10 



Mail any comments, bugs, or suggestions to: 
scordis@bioinf.man.ac.uk 
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FingerPRINTScan results page 



Page 1 of 3 



PRINTS32 0 and matrix blos62 



Scan of sequence: USERSEQUENCE 



Highest scoring fingerprints for your query 


Fingerprint 


E-value 


GRAPHScan 


GPCRRHODOPSN (relations) 


3.118054e- 
29 


Graphic 



for further information choose any of the following options 

• Simple - Top Ten 
• Detailed - Top Ten (detailed by motif) 
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Ten top scoring fingerprints for your query 


Fingerprint 


No. of 
Motifs 


Sumld 


Aveld 


PfScore 


Pvalue 


Evalue 


GRAPHScan 


GPCRRHODOPSN 


7 of 7 


1.8e+02 


25 


1733 


1.2e- 
34 
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Graphic 
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2 of 5 
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Graphic 
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2 of 9 
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07 
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3 of 7 
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20.35 


626 


1.8e- 
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Graphic 






BRADYKININR 


2 of 6 


59.29 


29.64 


419 


4.3e- 
06 


1.1 
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Graphic 






P2Y12PRNCPTR 


2 of 3 


54.53 


27.26 


466 


1.4e- 
05 


3.6 
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Graphic 






PAFRECEPTOR 


3 of 
11 


78.40 


26.13 
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1.5e- 
05 


3.9 


• a • 1 •••la 1 a 


Graphic 






ANGIOTENSINR 


2 of 8 


103.09 


51.54 
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2.8e- 
05 
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Graphic 
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39.25 
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Graphic 






ACRIFLAVINRP 


2 of 9 


34.38 


17.19 
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5.4e- 
05 
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Graphic 







Back to to p 



Ten top scoring fingerprints for your query. Detailed by motif 



FingerPrint Name 


Motif 
Number 


IdScore 


PfScore 


Pval 


Sequence 


GPCRRHODOPSN 


1 of 7 


23.40 


225 


8.79e- 
07 
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2 of 7 


24.21 
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8.19e- 
05 


FIFYLKNIVVADLIMTLTFPFR 


3 of 7 


35.51 


339 


l.lOe- 
08 


FYANMYTSIVFLGLISIDRYLKV 


4 of 7 


23.63 


251 


3.97e- 
04 


FTKVLSVCVWVIMAVLSLPNII 


5 of 7 


21.00 


134 


9.54e- 
03 


VTYVNSCLFVAVLVILIGCYIAIS 


6 of 7 


21.06 


264 


2.05e- 
04 


HNQ SIRV V V A V YFTCFLP YHLCRMP 


7 of 7 


27.45 


330 


1.97e- 
07 


KEITLFLSACNVCLDPIIYFFMCRSFS 


PROTEASEAR 


1 of 5 


28.57 


236 


3.07e- 
04 


KNTTLHNEFDTIVLPVLY 


4 of 5 


29.59 


224 


1.69e- 
04 


QSIRWVAVYFTCF 


CXCCHMKINER4 


2 of 9 


43.75 


346 


1.31e- 
04 


HNEFDTIVLPVLYLn 


4 of 9 


35.94 


350 


1.12e- 
03 


GPWYFKFILCRYTSVL 


DUFFYANTIGEN 


1 of 7 


14.88 


146 


7.40e- 
02 


LPVLYLIIFVASILLNGLAVWIFF 


3 of 7 


20.78 


263 


2.81e- 
03 


ILCRYTSVLFYANMYTSIVFLG 


7 of 7 


25.40 


217 r 


8.79e- 
03 


HLDRLLDESAQKILYYCKEITLFLSAC 
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BRADYKININR 


2 of 6 


33.57 


245 


5.18e- 

A/1 

04 


YTSVLFYANMYTSI 


3 of 6 


25.71 


174 


8.35e- 
Uj 


AVLSLPNIILTNGQ 


P2Y12PRNCPTR 


1 of 3 


31.25 


208 


7.77e- 

OJ 


SRMYSITFTKVLSVCV 


2 of 3 


23.28 


258 


1.78e- 


HKSSRQFISQSSRKRKHNQSIRVVVAVYF 




4 of 11 


21.15 


165 


8.94e- 
02 


GLISIDRYLKVVKPFGDSRMYSITFT 


PAFRECEPTOR 


8 of 11 


33.33 


332 


1.92e- 
03 


PYHLCRMPSTFSHLD 




10 of 
1 1 


23.91 


180 


8.91e- 

no 
Uz 


FFMCRSFSRWLFKKSNIRPRSES 


ANGIOTENSINR 


1 of 8 


60.49 


253 


1.67e- 

03 


LYLIIFVAS 


4 of 8 


42.59 


182 


1.70e- 

no 
Uz 


VLSLPNIILTNG 


CELLSNTHASEA 


5 of 9 


15.87 


165 


5.67e- 

AO 

Oz 


HIRNKTSFIFYLKNIVVADLIMTLTFP 


9 of 9 


23.38 


280 


9.09e- 

U4 


TYVNSCLFVAVLVILIGCYIAI 


ACRIFLAVINRP 


3 of 9 


17.92 


168 


4.03e- 
03 


GKNTTLHNEFDTIVLPVLYLIIFV 


7 of 9 


16.46 


150 


1.34e- 
02 


ITFTKVLSVCVWVIMAVLSLPNII 



> USER_SEQUENCE 

MGFNLTLAKLPNNELHGQESHNSGNRSDGPGKNTTLHNEF 
DT I VLPVLYLI I FVAS I LLNGLAVWI FFH I RNKTS FI FYL 
KNIVVADLIMTLTFPFRIVHDAGFGPWYFKFILCRYTSVL 
FYANMYTSIVFLGLISIDRYLKVVKPFGDSRMYSITFTKV 
LSVCVWVIb4AVLSLPNIILTNGQPTEDNIHDCSKLKSPLG 
VKWHTAVTYVNSCLFVAVLVILIGCYIAISRYIHKSSRQF 
ISQSSRKRKHNQSIRVWAVYFTCFLPYHLCRMPSTFSHL 
DRLLDESAQKILYYCKEITLFLSACNVCLDPIIYFFMCRS 
FSRWLFKKSNIRPRSESIRSLQSVRRSEVRIYYDYTDV 
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Bioinformatics 

SEQUENCE AND GENOME ANALYSIS 



T 

I he application of computational methods to DNA and protein science is a new and exciting 
development in biology. Bioinformatics: Sequence and Genome Analysis is a comprehensive 
introduction to this emerging field of study. The book has many unique and valuable features: 



r 

; 

I Underlying algorithms and assumptions are clearly explained for the non-specialist. 



Essential for any biologist who wants to understand methods of sequence and 
structure analysis and how the necessary computer programs work. 

Sequence alignment, structure prediction, phylogenetic and gene prediction, database 
searching, and genome analysis are clearly explained and amply illustrated. 



r 



Examples are presented in simple numerical terms rather than complex formulas and 
notation. 



I Theoretical underpinnings are linked to biological problems and their solutions. 



r 



Extensive tables provide descriptions and Web sources for a broad range of publicly 
available software. 

An associated Website ( www.bioinformaticsonline.or g), accessible free of charge by 
book purchasers, provides links to Internet sources referred to in the text, as well as 
problem sets for classroom use, and other useful material not included in the text. 



Based on a well-established course given at the University of Arizona by the author, David 
Mount, this book is an ideal foundation for teaching at an undergraduate and graduate level. 
It is also highly suited for the self-instruction of investigators interested in the application of 
methods and strategies in functional genomics and for the needs of information specialists 
working in molecular biology and pharmaceutical laboratories. 



www.bioinformaticsonline.org 
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-L/atabase similarity searches have become a mainstay of bioinformatics. Large sequenc 
ing projects in which all the genomic DNA sequence of an organism is obtained hav< 
become quite commonplace. The genomes of a number of model organisms have beer 
sequenced, including the budding yeast Saccharomyces cerevisiae, the bacterium Escherichit 
coli, the worm Caenorhabditis elegans, the fruit fly Drosophila melanogaster, and the humai 
species Homo sapiens. These species have also been subjected to intense biological analysi: 
to discover the functions of the genes and encoded proteins. Thus, there is a good deal o 
information available as to the biological function of particular sequences in model organ 
isms that may be exploited to predict the function of similar genes in other organisms. Ii 
addition to genomic DNA sequences, complete cDNA copies of messenger RNAs tha 
carry all the sequence information for the protein products have also been obtained fo: 
some of the expressed genes of various organisms. Translation of these cDNA copies pro 
vides a close-to-correct prediction of the sequence of the encoded proteins. Becaust 
obtaining intact cDNA sequences is laborious and time-consuming, a common practice i: 
to make a library of partial cDNA sequences from the expressed genes, and then to perforn 
high-throughput, low-accuracy sequencing of a large number of these partial sequences 
known as expressed sequence tags (ESTs). The objective of an EST project is to find enough 
sequence of each cDNA and to have enough accuracy in the sequence that the amino acic 
sequence of a significant length of the encoded protein can be predicted. Overlapping EST: 
can then be combined, and interesting ones can be found by database similarity searches 
The full cDNA sequence of these genes of interest may then be obtained. Once all the 
sequence information is collected and placed in the sequence databases, the big task a' 
hand is to search through the databases to locate similar sequences that are predicted tc 
have a similar biological function through a close evolutionary relationship. 

Sequence database searches can also be remarkably useful for finding the function o 
genes whose sequences have been determined in the laboratory. The sequence of the gene 
of interest is compared to every sequence in a sequence database, and the similar ones an 
identified. Alignments with the best-matching sequences are shown and scored. If a quen 
sequence can be readily aligned to a database sequence of known function, structure, oi 
biochemical activity, the query sequence is predicted to have the same function, structure 
or biochemical activity. The strength of these predictions depends on the quality of thi 
alignment between the sequences. As a rough rule, -if more than one-half of the amino acic 
sequence of query and database proteins is identical in the sequence alignments, the pre 
diction is very strong. As the degree of similarity decreases, confidence in the predictior 
also decreases. The programs used for these database searches provide statistical evalua 
tions that serve as a guide for evaluation of the alignment scores. 

Previous chapters have described methods for aligning sequences or for finding com- 
mon patterns within sequences. The purpose of making alignments is to discover whethei 
or not sequences are homologous or derived from a common ancestor gene. If a homolo 
gy relationship can be established, the sequences are likely to have maintained the same 
function as they diverged from each other during evolution. If an alignment can be founc 3 
that would rarely be observed between random sequences, the sequences are predicted tc 
be related with a high degree of confidence. The presence of one or more conserved pat- 
terns in a group of sequence is also useful for establishing evolutionary and structure-func- 
tion relationships among sequences. 

The above methods of establishing sequence relationships have been utilized in database 
searches that are summarized in Table 7.1. In addition to standard searches of a sequence 
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The Importance of (Sub)sequence Comparison in 
Molecular Biology 



Sequence comparison, particularly when combined with the systematic collection, curra- 
tion, and search of databases containing biomolecular sequences, has become essential 
in modern molecular biology. Commenting on the (then) near-completion of the effort to 
sequence the entire yeast genome (now finished), Stephen Oliver says 

In a short time it will be hard to realize how we managed without the sequence data. Biology 
will never be the same again. [478] 

One fact explains the importance of molecular sequence data and sequence comparison 
in biology. 

The first fact of biological sequence analysis 

Thefirstfact of biological sequence analysis In biomolecular sequences (DNA, RNA, 
or amino acid sequences), high sequence similarity usually implies significant functional 
or structural similarity. 

Evolution reuses, builds on, duplicates, and modifies "successful" structures (proteins, 
exons, DNA regulatory sequences, morphological features, enzymatic pathways, etc.). 
Life is based on a repertoire of structured and interrelated molecular building blocks that 
are shared and passed around. The same and related molecular structures and mechanisms 
show up repeatedly in the genome of a single species and across a very wide spectrum 
of divergent species. "Duplication with modification" [127, 128, 129, 130] is the central 
paradigm of protein evolution, wherein new proteins and/or new biological functions are 
fashioned from earlier ones. Doolittle emphasizes this point as follows: 

The vast majority of extant proteins are the result of a continuous series of genetic duplications 
and subsequent modifications. As a result, redundancy is a built-in characteristic of protein 
sequences, and we should not be surprised that so many new sequences resemble already 
known sequences. [129] 

He adds that 

... all of biology is based on an enormous redundancy [130] 

The following quotes reinforce this view and suggest the utility of the "enormous 
redundancy" in the practice of molecular biology. The first quote is from Eric Wieschaus, 
cowinner of the 1995 Nobel prize in medicine for work on the genetics of Drosophila 
development. The quote is taken from an Associated Press article of October 9, 1995. 
Describing the work done years earlier, Wieschaus says 

We didn't know it at the time, but we found out everything in life is so similar, that the same 
genes that work in flies are the ones that work in humans. 
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And fruit flies aren't special. The following is from a book review on DNA repair [424]: 

Throughout the present work we see the insights gained through our ability to look for 
sequence homologies by comparison of the DNA of different species. Studies on yeast are 
remarkable predictors of the human system! 

So "redundancy", and "similarity" are central phenomena in biology. But similarity has 
its limits - humans and flies do differ in some respects. These differences make conserved 
similarities even more significant, which in turn makes comparison and analogy very 
powerful tools in biology. Lesk [297] writes: 

It is characteristic of biological systems that objects that we observe to have a certain form 
arose by evolution from related objects with similar but not identical from. They must, 
therefore, be robust, in that they retain the freedom to tolerate some variation. We can take 
advantage of this robustness in our analysis: By identifying and comparing related objects, 
we can distinguish variable and conserved features, and thereby determine what is crucial to 
structure and function. 

The important "related objects" to compare include much more than sequence data, 
because biological universality occurs at many levels of detail. However, it is usually easier 
to acquire and examine sequences than it is to examine fine details of genetics or cellular 
biochemistry or morphology. For example, there are vastly more protein sequences known 
(deduced from underlying DNA sequences) than there are known three-dimensional pro- 
tein structures. And it isn't just a matter of convenience that makes sequences important. 
Rather, the biological sequences encode and reflect the more complex common molecular 
structures and mechanisms that appear as features at the cellular or biochemical levels. 
Moreover, "nowhere in the biological world is the Darwinian notion of 'descent with mod- 
ification' more apparent than in the sequences of genes and gene products" [130]. Hence 
a tractable, though partly heuristic, way to search for functional or structural universality 
in biological systems is to search for similarity and conservation at the sequence level. 
The power of this approach is made clear in the following quotes: 

Today, the most powerful method for inferring the biological function of a gene (or the protein 
that it encodes) is by sequence similarity searching on protein and DNA sequence databases. 
With the development of rapid methods for sequence comparison, both with heuristic al- 
gorithms and powerful parallel computers, discoveries based solely on sequence homology 
have become routine. [360] 

Determining function for a sequence is a matter of tremendous complexity, requiring biolog- 
ical experiments of the highest order of creativity. Nevertheless, with only DNA sequence it 
is possible to execute a computer-based algorithm comparing the sequence to a database of 
previously characterized genes. In about 50% of the cases, such a mechanical comparison 
will indicate a sufficient degree of similarity to suggest a putative enzymatic or structural 
function that might be possessed by the unknown gene. [91] 

Thus large-scale sequence comparison, usually organized as database search, is a very 
powerful tool for biological inference in modern molecular biology. And that tool is almost 
universally used by molecular biologists. It is now standard practice, whenever a new gene 
is cloned and sequenced, to translate its DNA sequence into an amino acid sequence and 
then search for similarities between it and members of the protein databases. No one today 
would even think of publishing the sequence of a newly cloned gene without doing such 
database searches. 
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The final quote reflects the potential total impact on biology of the first fact and its 
exploitation in the form of sequence database searching. It is from an article [179] by 
Walter Gilbert, Nobel prize winner for the coinvention of a practical DNA sequencing 
method. Gilbert writes : 

The new paradigm now emerging, is that all the 'genes* will be known (in the sense of being 
resident in databases available electronically), and that the starting point of biological inves- 
tigation will be theoretical. An individual scientist will begin with a theoretical conjecture, 
only then turning to experiment to follow or test that hypothesis. 

Already, hundreds (if not thousands) of journal publications appear each year that report 
biological research where sequence comparison and/or database search is an integral part 
of the work. Many such examples that support and illustrate the first fact are distributed 
throughout the book. In particular, several in-depth examples are concentrated in Chap- 
ters 14 and 15 where multiple string comparison and database search are discussed. But 
before discussing those examples, we must first develop, in the next several chapters, the 
techniques used for approximate matching and (sub)sequence comparison. 

Caveat 

The first fact of biological sequence analysis is extremely powerful, and its importance 
will be further illustrated throughout the book. However, there is not a one-to-one corre- 
spondence between sequence and structure or sequence and function, because the converse 
of the first fact is not true. That is, high sequence similarity usually implies significant 
structural or functional similarity (the first fact), but structural or functional similarity 
does not necessarily imply sequence similarity. On the topic of protein structure, F. Cohen 
[106] writes ". . . similar sequences yield similar structures, but quite distinct sequences 
can produce remarkably similar structures". This converse issue is discussed in greater 
depth in Chapter 14, which focuses on multiple sequence comparison. 
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ABSTRACT A functional cDNA clone for the histamine H, 
receptor was isolated from a cDNA library of bovine adrenal 
medulla by a combination of molecular cloning in an expression 
vector and electrophysiological assay in Xenopus oocytes. The 
Hi receptor cDNA encodes a protein of 491 amino adds (M T 
55,954) with seven putative transmembrane domains, illustrat- 
ing the similarity to other receptors that couple with guanine 
nuckotide-binding regulatory proteins (G protein-coupled re- 
ceptors). The sequence homology between the Hi and H 2 
receptors is not higher than that between the histamine Hi and 
mi -muscarinic receptors. The cloned receptor protein ex- 
pressed in COS-7 cells bound specifically to [ 3 H]mepyramine, 
an Hi receptor antagonist, and this binding was displaced by 
Hi receptor antagonists and histamine with affinities compa- 
rable with those in membranes of bovine adrenal medulla. Hi 
receptor mRNA was shown to be expressed in brain and in 
peripheral tissues, including lung, small intestine, and adrenal 
medulla. This investigation discloses the molecular nature of 
the Hi receptor — a receptor that mediates diverse neuronal and 
peripheral actions of histamine and that may be of therapeutic 
importance in allergy. 

Since Dale and Laidlaw (1) first reported the contraction of 
smooth muscle by histamine, the pharmacological signifi- 
cance of this phenomenon has been extensively investigated. 
Three subtypes of histamine receptor (Hi, H 2 » and H3) are 
known. The Hi receptor was identified by Ash and Schild (2) 
and Hi receptor antagonists have been used in the therapy of 
many allergic diseases, including urticaria, allergic rhinitis, 
pollenosis, and bronchial asthma. In peripheral tissues, the 
histamine H x receptor mediates the contraction of smooth 
muscles, increase in capillary permeability due to contraction 
of terminal venules, and catecholamine release from adrenal 
medulla (3), as well as mediating neurotransmission in the 
central nervous system (4). Although signal transduction of 
the H x receptor through Ca 2+ mobilization via an increase in 
the intracellular inositol 1,4,5-trisphosphate level has been 
extensively investigated (5, 6), little is known about the 
molecular structure of the histamine H x receptor. Recently, 
another method for cDNA cloning of Ca 2+ -mobilizing recep- 
tors through their expression in Xenopus oocytes has been 
developed (7). Meyerhof et al. (8) and Sugama et al. (9) have 
reported that the injection of poly(A) + RNA prepared from 
bovine adrenal medulla into Xenopus oocytes resulted in 
functional expression of the histamine Hi receptor in 
oocytes. The present study describes the cloning and se- 
quencing of a cDNA encoding histamine Hi receptor* from 
a cDNA library of bovine adrenal medulla using in vitro RNA 



The publication costs of this article were defrayed in part by page charge 
payment. This article must therefore be hereby marked "advertisement" 
in accordance with 18 U.S.C. (1734 solely to indicate this fact. 



transcription and electrophysiological assay with Xenopus 
oocytes. 

MATERIALS AND METHODS 

Materials. [ 3 H]Mepyramine (1073 GBq/mmol) and [o> 32 P]- 
dCTP («111 TBq/mmol) were purchased from DuPont/ 
NEN). Histamine and (+ Chlorpheniramine were purchased 
from Wako Pure Chemical (Osaka) and Tokyo Kasei (To- 
kyo), respectively. Mepyramine and doxepin were purchased 
from Sigma. ( -^Chlorpheniramine and famotidine were gifts 
from Smith Kline & French and Yamanouchi Pharmaceutical 
(Tokyo), respectively. A mammalian expression vector pEFr 
BOS (10) was donated by S. Nagata of the Osaka Bioscience 
Institute. 

Isolation of Poly(A) + RNA. Total RNA was extracted by the 
acid guanidinium isothiocyanate/phenol/chloroform method 
(11). Poly(A) + RNA was isolated by chromatography on 
oligo(dT)-cellulose (12). 

Expression Cloning of Histamine H t Receptor cDNA. Bo- 
vine adrenal medullary poly(A) + RNA («180 /ig) was size- 
fractionated on a 5-25% (wt/vol) sucrose-density gradient. 
An aliquot (1 p\) of each poly(A) + RNA fraction (20 pX) was 
injected into Xenopus oocytes, and electrophysiological as- 
say by measuring Ca 2+ -dependent inward CI" currents was 
done as described (9). The fraction that showed the highest 
histamine-induced inward CP currents was used for oli- 
go(dT)-primed cDNA synthesis. Double-stranded cDNAs of 
>2 : kilobase (kb) pairs were size-selected by agarose gel 
electrophoresis followed by elution with Geneclean II (Bio 
101, La Jolla, CA) and were ligated into AZAPII (Stratagene) 
at the EcoRl site. The library was divided and amplified in 65 
pools of » 20,000 independent clones each. Inpitro transcrip- 
tion was done essentially according to the procedure of Julius 
et al. (13). RNA transcripts («5 ng) from each pool were 
individually injected into Xenopus oocytes. After incubation 
for 1-2 days, the oocytes were tested for inward CI" currents 
induced by 100 /iM-histamirie under a voltage clamp at -60 
mV. The single positive pool of 20,000 clones was progres- 
sively subdivided into smaller pools of 8000, 4000, 400, and 
15 clones until finally a single clone was obtained. cDNA 
encoding the histamine Hi receptor was sequenced by the 
M13 chain-termination method (14) using a DNA sequencer 
(model 370A, Applied Biosystems). The sequence homology 
search was done by using dnasis (Hitachi Software Engi- 
neering, Yokohama, Japan). 

Expression of Histamine H] Receptor in COS-7 Cells and Its 
Determination by [ 3 H]Mepyramine-Binduig Assay. An EcoRI 



*To whom reprint requests should be addressed at: Department of 
Pharmacology II, Faculty of Medicine, Osaka. University, 2-2 
Yamadaoka, Suita 565, Japan. 

$The sequence reported in this paper has been deposited in the 
GcnBank data base (accession no. D90430). 
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fragment (2.7 kb) of the H t receptor cDNA was subcloned 
into the mammalian expression vector pEF-BOS at the BstXl 
site. COS-7 cells were transfected by the DEAE-dextran 
method and were harvested after 60 hr (15). Preparation of 
membranes from COS-7 cells and [ 3 H]mepyramine-binding 
assay were done by a described method (16). Nonspecific 
bindings of [ 3 H]mepyramine to both transfected and non- 
transfected cells at 2.6 nM radioligand were <10% of total 
binding to nontransfected cells. Specific binding of [ 3 H]me- 
pyramine to the nontransfected cells was observed (basal 
control), but that from the transfected cells assayed with 2.6 
nM [ 3 H]mepyramine (3.4 pmol/mg of protein) was =30 times 
the basal control (0.1 pmol/mg of protein). Specific binding 
of [ 3 H]mepyramine to the expressed binding site was calcu- 
lated by subtracting specific [ 3 H]mepyramine binding to the 
nontransfected cells from that to the transfected cells. 

RNA Blot Analysis. Poly(A) + RNA prepared from various 
bovine tissues was separated (7 pig per lane) by formalde- 
hyde/1% agarose gel electrophoresis (17) and transferred to 
a nylon membrane (Schleicher & Schuell). A 2.7-kb EcoRl 
fragment of the histamine Hi receptor cDNA was labeled 
with [a- 32 P]dCTP by the random- priming method and was 
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1441 AAC ACC TTC AAC AAA ATT CTC CAC ATT CCT TCC -AC CAGAGACTCCCACGGCATCCAGCCAAGTGACGCTTACTCATGTCCCTGAACCAACTCAACCACCAAGCCTCTTCCCTTC 
481 Lys Thr Phe Lys Lys He Leu His He Arq Ser ••• 

1556 CCACCCACCTCCCCCTTCTCCAGTCGCAAGAATCGTCT^^ 

1683 CAATGTTTGCAACACACTCAGATCTCTCCAACCTCTCCTGTTTC T CCGCAAT G TGCC GCCC TCACCCT CACACTC T AATTC C AC C T TT CACAC TC AAATT ATT CCCCC ACTC AACCGACC TCT CCGT 

1 « 1 0 AGACTTCCACTCCACTCTCCACCCTTCTTCAAATCCACCTCCACCTCTCTGCACCACACACC^ 

1937 TCTGACTCCCACATCTCAGAACACCTCTCTTCTCAGCCTCTTTTGCAGCTTTCTC 

2064 ATCGGAAATCATGCACTCTCCACATCCATCATTTTCAAACCCAAATTCCATTCTCCTATTAAAG 

23U AGCTCCAACAAGGGACCTCACA^ 

24 4 5 CJ^ACACACACACACAGACACACACATTCATAATGCCTGACAGTGGTGCCACTTC 
2572 CTATACTTTTTCATCTGGGAATTCTGCTGTGTTTATCCAAGAAACATCATCATGTACTTTTATGGT 
2699 GTTGCAATCTGGTTGTGATTTATATGCTAAAACTGGATGTTAAACTCTAATACATGTAGCCAGTGGGACT 
2826 CACAGATTTTTACCTACTAAAATATGAT 2853 

Fig. 2. Nucleotide and deduced amino acid sequences of the histamine Hi receptor cDNA clone. Sequences of both strands of cDNA were 
determined. Positions of the putative transmembrane segments I-Vli of the Hi receptor are indicated below amino acid sequence; the terminal 
of each segment is tentatively assigned from a hydropathy profile. Triangles indicate potential N-g!ycosylation sites. 
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Fig. 1. (A) Current trace recorded from a Xenopus oocyte 
injected with in vitro synthesized histamine H x receptor mRNA. (B) 
Mepyramine (10 fiM) was administered 30 sec before histamine 
application. Recordings were obtained at a voltage-clamped mem- 
brane potential of -60 mV. Concentration of histamine applied was 
100 /iM; horizontal bar indicates duration of application. Data were 
reproducible (n = 5), and representative tracings are shown. 
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used as a probe (18). Hybridization was done at 42°C in 5x 
standard saline citrate/20 mM sodium phosphate, pH 7.0/1 x 
Denhardt's solution/50% (vol/vol) formamide/0.1% SDS/ 
10% (wt/vol) dextran sulfate/salmon sperm DNA at 100 
/ig/ml. The membrane was washed with 0. 1 x standard saline 
citrate and 0.1% SDS at 42°C. 

RESULTS 

Isolation of a Histamine H, Receptor cDNA. Poly(A) + RNA 
isolated from bovine adrenal medulla was size-fractionated in 
a sucrose-density gradient. Two peaks giving histamine- 
evoked inward currents in oocytes were observed in the size 
range of 2.5- to 3.5-kb nucleotides and above 5-kb nucleotides 
(data not shown). A cDNA library was constructed from 
po!y(A) + RNA in the fraction of 2.5- to 3.5-kb nucleotides 
giving the highest response. Of 65 pools tested only one pool 
gave small inward currents in response to 100 /xM histamine. 
After several subdivisions of the positive pool, a single clone 
encoding for a functional histamine Hi receptor was isolated; 
histamine induced inward CI" currents in oocytes injected 
with in W/w-transcribed mRNA from the cloned histamine 
Hi receptor cDNA (Fig. 1), and mepyramine, an Hi receptor 
antagonist, at 10" 6 M completely blocked the histamine- 
induced response in oocytes. 

Primary Structure of the Histamine Hi Receptor. The 
nucleotide and deduced amino acid sequences of the bovine 
histamine Hi receptor are shown in Fig. 2. The clone (2960 
nucleotides long) consisted of 107 nucleotides of the 5' 
untranslated region, 1473 nucleotides of the coding region, 
and 1380 nucleotides of the 3 '-untranslated region. The 
histamine Hi receptor cDNA encodes a protein of 491 amino 
acids with a Af r of 55,954. 

Pharmacological Characterization of [ 3 H]Mepyramine- 
Binding to the Histamine H t Receptor Expressed in COS-7 
Cells. For determination of pharmacological characters of the 
receptor, the EcoKl fragment (2.7 kb) of the Hi receptor 
cDNA was subcloned into the mammalian expression vector 
pEF-BOS, and the vector was introduced into monkey kid- 
ney COS-7 cells. After 60-hr incubation, the binding of 
[ 3 H] mepyramine to the membranes from the cells was mea- 
sured. Specific binding of [ 3 H]mepyramine to the expressed 
binding site was saturable, and Scatchard plot analysis indi- 
cated the presence of a single binding site with a K d value of 
3.2 nM and a value of 6.6 pmol/mg of protein (Fig. 3 A). 
K\ values of mepyramine, and (+)- and (-)-chlorphenir- 
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amines were determined to be 2.6 x 10~ 9 M, 8.0 x io~ 9 M, 
and 7.6 x 10" 7 M, respectively (Fig. W). These K d and K { 
values and the stereoselectivity of (+)- and (-Chlorphen- 
iramines for the binding site expressed in COS-7 cells were 
comparable with those for adrenal medullary membranes. 
The K d value was 1.5 x 10" 9 M; K t values were 1.8 x 10" 9 
M (mepyramine), 4.3 x 10~ 9 M [^-chlorpheniramine], and 
4.6 x 10 " 7 M [(- Chlorpheniramine], as described (19). 

Tissue Distribution of Histamine Hi Receptor mRNA. Tis- 
sue distribution of receptor mRNA was determined by RNA 
blot analysis (Fig. 4). A band of 3.0-kb nucleotides corre- 
sponding to a histamine Hi receptor mRNA was detected in 
various bovine tissues. The level of H! receptor mRNA was 
high in the lung and small intestine, moderate in the adrenal 
medulla and uterus, and lower in the cerebral cortex and 
spleen. No Hi receptor mRNA was detectable in the cardiac 
atrium or liver. 

DISCUSSION 

In the present study, we isolated and sequenced a cDNA 
clone for the bovine histamine Hi receptor by using an oocyte 
expression system and also examined the pharmacological 
properties of this receptor and the tissue distribution of its 
mRNA. 

The cloned cDNA had no poly(A) + , but its size [2960 base 
pairs (bp)] was comparable with that of histamine Hi receptor 
mRNA determined by RNA blot analysis. The M r of encoded 
Hi receptor (55,954) was also consistent with the values 
estimated by photoafTinity labeling of bovine adrenal medulla 
{M r 53,000-58,000) (19) and in guinea pig tissues (A/ r 56,000- 
57,000) (20). Hydropathy-profile analysis (21) of the hista- 
mine Hi receptor revealed the existence of seven putative 
transmembrane domains, indicating a similar topology to 
those proposed for other G protein-coupled receptors. The 
histamine H x receptor also possesses a characteristic large 
third cytoplasmic loop and short carboxyl terminus (22), as 
do the mi-muscarinic (23) and dopamine-D 2 (24) receptors. 
We observed another ATG codon 39 bp downstream from the 
presumed initiation codon. Comparison with Kozak consen- 
sus sequence (25) indicated that neither of the two ATG 
codons had any advantage as an initiation codon. However, 
as receptors for biogenic amines and acetylcholine possess 
conservative aspartate residues at position 108 as putative 
binding sites for their monoamine and tertiary-amine residues 
(26), we presume that the upstream ATG codon is the 
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Fig. 3. Binding of [ 3 H]mepyramine to transfected COS-7 cell membranes. (A) Saturation isotherm of specific binding of [ 3 H] mepyramine 
to membranes from COS-7 cells transfected with the receptor cDNA (O). (Inset) Scatchard plot of this data. B/F, bound/free. {B) Inhibition 
of [ 3 H]mepyramine-binding to transfected COS-7 cell membranes by various drugs. Membranes were incubated with 4 nM [ 3 H]mepy famine and 
various concentrations of doxepin (a), mepyramine (o), (+ Chlorpheniramine (Q), (-^-chlorpheniramine (■), famotidine (a), or histamine (•). 
Data points are means of triplicate experiments. 
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Fig. 4. RNA blot analysis of mRNA isolated from various bovine 
tissues. Lanes contain 7-/ig samples of poly(A) + RNA from cerebral 
cortex (lane 1), lung (lane 2), liver (lane 3), cardiac atrium (lane 4), 
small intestine (lane 5), adrenal medulla (lane 6), spleen Cane 7), and 
uterus (lane 8). Arrow indicates Hj receptor mRNA. 

initiation codon because it would give a histamine Hi recep- 
tor with the conservative aspartate residue at position 108. 
The histamine H 2 receptor is highly similar to other G 
protein-coupled receptors. The sequence of the histamine H : 
receptor is compared with those of some other G protein- 
coupled receptors in Fig. 5. Sequence homology of trans- 
membrane domains between Hi and H 2 receptors (40.7%) 
(27) is not higher than that between H x and m r muscarinic 
receptors (44.3%) (23). 

There are two potential N-glycosylation sites (Asn-5, Asn- 
18) in the amino-terminal region with a consensus sequence 
Asn-Xaa-Ser/Thr (Fig. 2) (29). Mitsuhashi and Payan (30) 
reported regulation of the affinity of the histamine Hi recep- 
tor by its glycosylation. An additional N-glycosylation site 
(Asn-187) was observed in the second extracellular loop of 
the cloned receptor. 

The third cytoplasmic loop of the histamine Hi receptor, 
which, by analogy, is thought to interact with a G protein, has 



many serine and threonine residues that may serve as sites for 
phosphorylation by protein kinases (Fig. 2). Signal transduc- 
tion through the histamine Hi receptor is depressed by 
activation of protein kinase C in various cells (31-33). Thus, 
the potential sites of phosphorylation in the third cytoplasmic 
loop may play an important role in regulating signal trans- 
duction through the receptor molecule. 

Amino acid residues that are conserved in G protein- 
coupled receptors were also seen in the Hi receptor: (/) Two 
cysteines (Cys-101 and Cys-181) that have been proposed to 
form a disulfide bond appear in the first and the second 
extracellular loops (34). (it) An aspartate residue (Asp-74) is 
present in the second transmembrane domain. (hi) An anionic 
and cationic amino acid pair (Asp- 125 and Arg-126) occurs at 
the cytoplasmic border of the third transmembrane domain, 
(tv) A conservative sequence of 10 amino acids (Leu-460- 
Pro-469) is observed in the seventh transmembrane domain! 

The Hi receptor mRNA was visualized by RNA blot 
analysis in various bovine tissues in which the existence of H x 
receptors was reported (3). The presence of the H x receptor 
mRNA in bovine uterus was clearly demonstrated, whereas 
only H 2 receptors (35) and both Hi and H 2 receptors (36) were 
reported present in the uterus from pharmacological studies. 
The band of Hi receptor mRNA from brain was unexpectedly 
faint (Fig. 4); this observation was surprising because the 
[ 3 H]mepyramine-binding capacities of brain membranes from 
various species are reported comparable to those of mem- 
branes from peripheral tissues (6). Doxepin is a potent 
displacer of PH]mepyramine bound to the histamine Hi 
receptor from bovine adrenal medulla (Fig. 3). A doxepin- 
insensitive subtype of histamine Hi receptor has been pro- 
posed to be present in brain because the binding capacity of 
[ 3 H]doxepin to rat brain membranes is —10% that of [ 3 H]me- 
pyramine (37). 

Cardiac atrium and liver did not give detectable bands of H! 
receptor mRNA (Fig. 4). Pharmacological studies indicate 
the presence of Hi receptors in heart (3). However, biochem- 
ical results (20) show that the M x of the histamine Hi receptor 
in guinea pig heart is 68,000, which is larger than the sizes (Af r 
56,000-57,000) of these receptors in lung, intestine, and 
cerebellum, suggesting a subtype of Hi receptors in heart in 
which the Hi receptor mRNA does not hybridize with the 
cloned cDNA. A relatively large amount of [ 3 H]mepyramine- 
binding protein is present in liver and was recently suggested 
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LRYRTKTRASITIIAAWFIJF-I^I -IPI -IX>fRHFQPKTP-EPREDKCETDFYNVTV^KVKTAI INITLP-TLLHLWFYAKIYKAVRQHCQHRELINGSF- (180 a.».)- 
P\aiTPVRVAVSLVLIWISITLSF-l^IHU^SRNETSSFNHTIPKC^ a . a>) _ 
RAiCRTFRRAAI^IGLAWLVSFVI*A-PAI-IJ^-QYLVGEJt-™^ a ^ a .,_ 
PTIVTQKRGLMAIJXn/VAI£LVISI^FL-FGWR--QP-A*-E^ a.a. ) - 

5RFNSRTKAIMKIAIVWA1SIGVSVPIPV-IGLRD-ESKV— FVHHTTC VLNDPNFVLICSFVAFFIPLTIMV1 TYFLT XYVLRRQTLtfLLRGHTEEE- (49 a.a.)- 

TRYSSKRRVTVHIAI VWLSF-TI S-CPL-L FGLNWT-D-- flHEC 1 IAKPAFVVYSSIVSFYVP-FIVTLLVYIKI YIVLRJCRRKRVNTKRSSR- ( 107 a.a.)- 



02 VI VB 

Bl HMRaAKOlCriMAArilCHIPYFIFFMVIA-F-CESCCBO HVHXFTIWLGTINSTXHPLITPLCNENFKKTFKKILBIRS 

B2 1CMKXTVTIAAVHGAJ 1 ICVFPYFTVFVYRG-LKGODA I KE AFEAWLWLGTAKSAUIP I LYATLKRDFRTA YQQLFRCRP 

Ml VKEKKAAKTLSAILLAPILTWTPYHIMVLVST-F-CKDCVPE TI^WTrWLCTVNSTVKPMCYASCNKAFRDHFRLLLLCRW-- 

01 SR1KKAAKTLGIWGCFVLCWLPFFLVMP1GSFF-P0FRPSE TVFKIAFWLGYLKSCIOTIIYPCSSQEFKKAFQNVLRIQC-- 

SHT-lc NWRK*^KVIfiIVFFWLIKWCPrFITNlLSV-L-CGKACWQKI^^ 

02 QKXKXATQMLAIVLGV7I ICWLPFFITHI LNI -H-CO--CNI PPVLYSAFTVLCTVNSAVlffP 1 1 YTTFHI EFRKAFMKI LHC 

Fic. 5. Alignment of amino acid sequences of bovine histamine Hi receptor (HI) and some representative G protein-coupled receptors. H2, 
canine histamine H 2 receptor (27); Ml, mouse mi-muscarinic receptor (23); al, bovine aiC-adrenergic receptor (28); 5HT-lc. rat serotonin lc 
receptor (13); and D2, rat dopamine D2 receptor (24). Amino acid residues shown by boldfaced type in sequences are identical; residues 
nonhomologous with Hi receptor sequence in the loop between transmembrane segments V-Vl are summed in parentheses. Positions of putative 
transmembrane segments I— VII of Hi receptor are indicated. 
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to be a member of the family of debrisoquine-type cyto- 
chrome P450s (38). 

The receptor cDNA clone for the classical histamine 
receptor (3), the Hi receptor, isolated in this study, will be 
useful for molecular studies of function and regulation of 
activities mediated through the Hi receptor molecule and for 
molecular analysis of possible Hi receptor subclasses. In situ 
and immunocytochemical studies on localization of the H! 
receptor will also be helpful in analyzing physiological func- 
tions of histamine in the central nervous system and in 
peripheral tissues. 
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ABSTRACT The H2 subclass of histamine receptors me- 
diates gastric acid secretion, and antagonists for this receptor 
have proven to be effective therapy for acid peptic disorders of 
the gastrointestinal tract. The physiological action of histamine 
has been shown to be mediated via a guanine nucleotide- 
blnding protein linked to adenylate cyclase activation and 
cellular cAMP generation. We capitalized on the technique of 
polymerase chain reaction, using degenerate oligonucleotide 
primers based on the known homology between cellular recep- 
tors linked to guanine nucleotide-binding proteins to obtain a 
partial-length clone from canine gastric parietal cell cDNA. 
This clone was used to obtain a full-length receptor gene from 
a canine genomic library. Histamine increased in a dose- 
dependent manner cellular cAMP content in L cells perma- 
nently transfected with this gene, and preincubation of the cells 
with the H2-selective antagonist cimetidine shifted the dose- 
response curve to the right. Cimetidine inhibited the binding of 
the radiolabeled H2 receptor-selective ligand [/ne//iy/- 3 H]tio- 
tidine to the transfected cells in a dose-dependent fashion, but 
the Hl-selective antagonist diphenhydramine did not. These 
data indicate that we have cloned a gene that encodes the H2 
subclass of histamine receptors. 



Histamine is one of the major determinants of gastric acid 
secretion. On the gastric parietal cell, histamine exerts its 
stimulating action through an H2 subclass of receptor cou- 
pled via a guanine nucleotide-binding protein (G protein) to 
activation of adenylate cyclase and production of cAMP. 
Antagonism of histamine's action at this receptor has been 
the cornerstone of an immense market for pharmacological 
treatment of acid-peptic disorders of the gastrointestinal 
tract. Through its three known receptor subclasses (HI, H2, 
and H3), histamine has been shown to exert a broad array of 
other physiological actions as well, including mediation of 
allergic and anaphylactic responses, modulation of cardiac 
contractility and systemic blood pressure, and mediation of 
neural function in the central nervous system (1-4). Despite 
this wealth of pharmacological information, little is known 
about the structure of the histamine receptor. The present 
studies describing the cloning and sequencing II of a gene 
encoding a protein with the functional characteristics of an 
H2 subclass of histamine receptors provide insight into the 
molecular biology of histamine action. 

In recent years the genes for a family of G protein-linked 
receptors have been cloned, and analysis of the deduced 
structures of their proteins has indicated that they have a 
motif of seven transmembrane regions. Capitalizing on the 
similarities of the amino acids comprising the transmembrane 
regions, Libert et al. have devised a strategy to clone other 
members of this family (5). By using synthetic oligonucleo- 
tides complimentary to the DNA encoding the transmem- 
brane regions of known G protein-linked receptors as primers 
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for the polymerase chain reaction (PCR), they were able to 
generate partial cDNA sequences encoding proteins having 
the common transmembrane motif. We utilized this strategy 
to clone the histamine H2 receptor gene, using cDNA from 
canine gastric parietal cell mRNA as a template. 

MATERIALS AND METHODS 

Isolation of Parietal Cell mRNA. Cells from freshly obtained 
canine fundic mucosa were dispersed by sequential exposure 
to crude collagenase at 0.25 ptg/ml and 1 mM EDTA, and a 
fraction enriched in parietal cells (70%) was isolated by 
counterflow elutriation by the method of Soil (6). RNA was 
extracted by the acid guanidinium isothiocyanate-phenol- 
chloroform method (7), and poly(A) + RNA was obtained by 
oligo(dT)-cellulose chromatography. The poly(A) + RNA 
served as a template for cDNA synthesis using the avian 
myeloblastosis virus reverse transcriptase (Seikagaku Amer- 
ica, Rockville, MD). The cDNA thus obtained functioned as 
a template for the PCR with the oligonucleotide primers 
described below. 

PCR. Oligonucleotides corresponding to the third and sixth 
transmembrane domains of G protein-linked receptors were 
duplicated from the design of Libert et al. (5) with the 
exception that our primers lacked the linker sequences. The 
primers were synthesized by using an Applied Biosystems 
380B DNA synthesizer. The conditions for the PCR were as 
follows: denaturation for 1.5 min at 94°C, annealing for 2 min 
at 45°C, and extension for 4 min at 72°C. The reaction was 
carried out for 30 cycles, and then 20% of the product was 
added to fresh buffer and submitted to another 30 cycles. The 
final reaction products were extracted with phenol/chloro- 
form, 1:1 (vol/vol), and then precipitated with ethanol. DNA 
polymerase I Klenow fragment was used to form blunt-ended 
DNA, and the products of this reaction were electrophoresed 
on a 2% NuSieve/1% Seaplaque gel (FMC). Of the two major 
bands that were produced, the one of —400 base pairs (bp) 
was cut from the gel and subcloned directly into the phage 
M13 sequencing vector (8). Dideoxynucleotide sequencing 
was then performed by the chain- termination method of 
Sanger (9) with Sequenase version 2 (United States Biochem- 
ical). 

Genomic Cloning. The partial-length PCR-derived clone 
was random-primed (10) with 32 P and used as a probe to 
screen a canine genomic library (Clontech). Under high- 
stringency hybridization [0.9 M sodium chloride/0.09 M 
sodium citrate (6x SSC) at 65°C] and wash conditions (0.1 x 
SSC at 55°C), a single clone exhibited a positive hybridization 
signal with the probe. Restriction enzyme mapping of the 
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DNA insert in this clone revealed an XbalSau I fragment 
that contained the partial-length PCR^derived clone, which 
was inserted into the M13 vector and sequenced. 

Expression Experiments. The presumed full-length coding 
region of the receptor was subcloned into CMVneo, a 
PUC13-based vector that also contains the lacUV5-SV40 
(simian virus 40) promoter (440 bp), Tn5-neo (1400 bp) SV40 
splice site and polyadenylylation signal (320 bp), cytomega- 
lovirus (CMV) promoter (700 bp), and human growth hor- 
mone polyadenylylation signal (700 bp) (11). L cells were 
transfected by the technique of calcium phosphate coprecip- 
itation (12). Permanently transfected L cells were selected by 
adding the neomycin analogue G418 to the culture medium at 
600 /ig/liter. The expression of the receptor gene in the 
selected clones was examined by RNA blot hybridization 
(Northern) analysis (see below) coupled with functional 
assays as follows. The cells were incubated in Earle's bal- 
anced salt solution with varying concentrations of histamine 
for 60 min at 37°C after a 60-min preincubation in medium 
with or without 100 /iM cimetidine. Ice-cold 30% trichloro- 
acetic acid was added to stop the reaction and precipitate the 
cellular protein. After centrifugation for 10 min at 1900 x g, 
the supernatant was extracted with ether, lyophilized, and 




-1.353 bp 
-603 bp 

-310 bp 



Fig. 1. Gel electrophoresis of PCR products from a gastric 
parietal cell cDNA template. 

resuspended in 50 mM Tris/2 mM EDTA, pH 7.5. The 
content of cAMP was measured by a competitive protein- 
binding assay using an Amersham kit. For binding studies, 
transfected L cells were plated and grown to confluence in 2.4 



-187 CTJUauUUUtfacfcCTCGGOaiCTTATTCTAigCTfT^ 1111.1111 GCCTCCATTAGGACOCTA 

- 1 0 0 CAGCCCAGCGGTTGACATCATTGACACACTGGGGAGCT 

I I I I I t I I I 

1 ATCATATCTAACCGCACAGGCTXZmX^TTGTCTGGACT^ 

1 ' MISNGTGSSFCLDSPPCRITVSVVLTVLIL 

I I I I I I I I I 

91 ATCACCATC«:CCGCAATGTGGTGCTCTGCCTGGCTGTGGGCCT^^ 

31 IT I AGNVVVCLAVGLHRRLRSLTNCF IVSL 



I I I I I I I I I 

181 GCTATCACCGATCTGCTCCTCGGCCTCCTGGTGCTGCCCTTCTCGGCCTTCTACCJW 
61 AITDLLLGLLVLPFSAFYQLSCRWSFGKVF 

271 TGCAATATCTATACCAGCTTGGATGTGATGCTGTCCACGGCCT 
91 CNIYTSLDVMLCTASILNLFMISLDRYCAV 

121 TDPLRYPVLITPVRVAVSLVLIWVI SITLS 

I I ! I I I I I I 

4 SI TTCCTGTCTATTCATCTGGGGTGGAACAGCAGGAATGAGACCAGCAGTTTCAATCACACCACT 

151 FLSIHLCWNSRNETSSFNHTIPKCKVQVNL 



I I I I I I I I I 

541 GTGTATGGCTTGGTCGATGGGCTCGTCACCTTCTACCTGCCGCTGCT 

181 VYGLVDCLVTFYLPLLVMCITYYRIFKIAR 



I I I t . I I I I t 

631 AO^TCCATCACCAGGCCAAGCACATGCCCIW 

211 D Q A K R I HHHGSWKAAT IGEHKATVT L A A V H 



I I I I I I I I I 

721 GGAGCCTTCATCATATGCTGGTTCCCCTA Ll 1 1A CTCTC 1 1 11 1 1 lA CCCTGGCCTGAAACOCOATCATGCCATCAATGAC U. 1111 CAA 
241 GAFI ICWFPYFTVFVYRCLKGOOAI H E A F E 



I I I I I I I I I 

811 GCCCTCCTTCTGTGCCTCGGCTATGCCAACTOraX^ 

271 AVVLWLGYAHSALHPILYATLHRDFRTAY Q 



I I I I I I I I I 

901 CAGCrCTTCCGCTGCAGGCCGGCCAGCCAC^ 

301 QLFRCRPASHHAQETSLRSHSSQLARHQSR 

I I I I I I I I t 

991 GAACCCATGCGCCAGGAAGAGAAGCCCCTGAACCTCCAGGTGTCGAGTG 

331 EPMRQEEKPLKLQVWSCTEVTAPRCATOR- 



1061 TTGCCCTGACX^TTTCTGTACCAGACAAGCGCCTGGGGAGCG^ 

1171 AGCTACTTCAACATTCTCTGCTCOGAACT IT TCATGAGCACTTTGCAAACCTCAT Or ' l ' U 1 1 CCATCCTCCCAATG GC CT C CT 

Fig. 2. The nucleotide and deduced amino acid sequence (in single-letter code) of the canine histamine H2 receptor gene. 
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Fig. 3. Expression of the canine histamine H2 receptor gene in 
various tissues. (Left) Northern blot showing the hybridization of 10 
/ig of poly(A) + RNA extracted from each of the designated tissues 
with the 32 P- labeled gene. (Right) Comparison of the expression of 
the receptor gene in a fraction of fundic mucosal ceils consisting of 
roughly 70% parietal cells and a fraction consisting of the nearly 
100% chief cells, the primary contaminant in the parietal cell- 
enriched fraction. 

x 1.7 cm multiwell plates. The culture medium was removed, 
and cells were washed twice with Earle's balanced salt 
solution containing 0.1% bovine serum albumin. An aliquot 
(36 nCi; 1 Ci = 37 GBq) of [me7/ry/- 3 H]tiotidine (87 Ci/mmol; 
DuPont) was added to the culture in the presence of either 
cimetidine or diphenhydramine; after 1 hr of incubation, the 
medium was removed by aspiration. After, the cells were 
washed twice with phosphate buttered saline (PBS), pH 7.4, 
and lysed with 1% Triton X-100, the radioactivity was 
quantified. Maximum binding was determined by incubation 
of [/ne//iy/- 3 H]tiotidine with transformed L cells in the ab- 
sence of antagonists. Nonspecific binding, which was sub- 
tracted from total binding to obtain specific binding, was 
determined as the amount of label remaining bound in the 
presence of 100 fiM histamine. 

Northern Blots. The expression of the cloned gene was 
examined in various tissues by Northern blot analysis. For 
these studies, poly(A) + RNA was extracted as described 
above, separated on a 1.25% formaldehyde-agarose gel, and 
blotted to nitrocellulose. Hybridization was performed under 
conditions as described (13) with the presumed coding region 
of the receptor gene that had been labeled with 32 P by random 
priming (10). The final washing of the blot was in 0.1 x SSC 
at 65°C. 

250 1 
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Fig. 4. Response to exogenously administered histamine of L 
cells transfected with a CMVneo vector containing the canine 
histamine H2 receptor gene insert. The data represent means ± SEM 
from four experiments. Response was shifted by addition of 0.1 mM 
cimetidine, an H2 receptor-selective antagonist. 
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Fig. 5. Inhibition of [me/Ay/- 3 H]tiotidine binding to transfected L 
cells by diphenhydramine and cimetidine. The data are from a single 
experiment and are virtually identical to the data obtained in two 
other experiments. 

RESULTS 

An ethidium bromide-stained gel of the products of PCR is 
depicted in Fig. 1. As noted above, two major bands of «400 
bp and ^350 bp were produced, and the former band was cut 
from the gel and cloned into phage M13. Of 12 clones 
obtained, only 1 had the nucleotide and deduced amino acid 
sequence expected of a G protein-linked seven-transmem- 
brane receptor. Computer analysis of the amino acid se- 
quence of this single clone revealed extensive homology to 
other known G protein-linked receptors, and Kyte-Doolittle 
analysis confirmed the presence of the two hydropathic 
putative transmembrane domains between the third and sixth 
transmembrane sequences upon which the primers were 
based (14). Screening a canine genomic DNA library resulted 
in one clone with a positive hybridization signal. The nucle- 
otide and deduced amino acid sequence of the presumed 
coding region of this gene is depicted in Fig. 2. Northern blot 
analysis showed that the gene was expressed most abun- 
dantly in the gastric fundus and , to a lesser extent, in the brain 
(see Fig. 3). Further analysis revealed that parietal cells were 
most likely to be the origin of the positive hybridization signal 
obtained with gastric poly(A) + RNA. 

The L cells transfected with the H2 receptor construct 
showed dose-dependent increases in cellular cAMP content 
in response to histamine stimulation (Fig. 4), reaching a 
maximum response of 217 ± 10% over basal (mean ± SEM; 
n - 3) after the 10 /xM histamine dose. The dose-response 
curve could be shifted to the right by the H2 receptor- 
selective antagonist cimetidine. Serotonin, epinephrine, do- 
pamine, and carbamoylcholine in doses as high as 100 /iM 
had no effect on cAMP content. Nontransfected L cells, L 
cells transfected with a CMVneo vector missing the receptor 
gene construct insert, and L cells transfected with a CMVneo 
vector containing as an insert a gene encoding the a catalytic 
subunit of the cAMP-dependent protein kinase all failed to 
demonstrate any response to histamine. Cimetidine displaced 
binding of [me//ry/- 3 H]tiotidine to transfected cells in a dose- 
dependent fashion with an ED50 of 5.5 ± 0.6 x 10" 7 M (mean 
± SEM; n = 4) (Fig. 5). In contrast, diphenhydramine, a 
relatively selective HI receptor antagonist, demonstrated no 
ability to inhibit [methy /- 3 H]tiot id ine except at the highest 
dose. 

DISCUSSION 

We utilized the PCR to clone a gene encoding a protein with 
the functional properties of a histamine H2 receptor. Al- 
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CANH2 - MISNGTGSSrCLDSPPCRITVS W-LTVLI LITIAGNVWCLAVGI2TRKLRSL 

HAMA0RB2 - KCPPGWDSDFIXTTKCSHVPOHDVrrBEIU)EAWVVGAILMSVIVLAIVCGF GNVLV I TA I AKFBKLQTV 

KUHADB3 - -KAPWPHENSSLAPWPDLPTUO»NTANTSH3IJ>CPAVEAAI^AIJ^I^VIATV KLLVIVAIAWTPRLOIN 

BOVSUBK - MGACWMTDINISS-GLDSNATCI TAFSMPCWQLALWTAAY LAL— VLVAVM GNATV I W 1 1 LAHQRKRTV 

HUKACHRM2 - HFVNFILFPCTU?KCLrATVIAIR2RKHNNSTOSSW^ 

RATOOP2 - HDPLNL^WDODl^RQWSRPTOGSEGKADRPHYNyyAMLLTLLIFIIVFGN-VLVCM AVSREKALQTT 



II 



III 



CANH2 - NCPIVSUVITDIXLGIXVI^FSAPYQl^SCRWSFtJKVFCNIYTSLDVKLC-TA SILKLFMISLDRTCA-VTDP-LRYPVLIT 

HAMADRB2 - KYFITSIACAJ)I.VMOIAVVPFGASHIIJ4KKWNFCNFHCEFWrSIDV-IX^TA SI ETLCV I AVSRY X A ITS PFKYQS— I*LI 

HUMAOB3 - HVPVTSLAAADLVMGLLWP PAATLALTGHWPLGATGCELWTSVDV-LCVTA S I ETXCALA VDRt LA- VTNT -LRYGALVT 

BOVSUBK - NY P I VNLALAD LCKAAF N AAFN FVY AS HNIWYFC RAFCY FQNLF P I TAMFVS I Y SHTA I AADRTKAI VHPFQPR LSA 

HUMACHRK2 - KYFLFSIACAI>LIIGVFSMNLYTLYTVIGYWPUSPWCOLWUUJ)YWS-NASVMHLLIISF DRTFC -VTKP-LTTPVKRI 

RATOOP2 - HYLIVSLAVADLLVATLVMPWVVYLEWGEWKFSRIHCD I FVTLDVMMC-TA S ILNLCA I S IDRYTA-VAMPMLYNTRYSS 



CANH2 

HAMAD RB2 

HUMADB3 

BOVSUBK 

HUHACHRK2 

RATOOP2 



IV V 

V--RVAV SLVLI WVIS ITLSFLS IHLGWHSR-KETSS FKHT I PKCKVQV HLVYGL- VDGLVTFYLP LLV 

NKARM-V-ILM VHIVSGLTSFLPIQHHWY-RATHQKAID CY HKET -CCD FFTNQA Y- AI ASS IVSFTVP LW 

RCARTAV VLVHWSAAVSFAPIMSQHW-RVCA DAEAQRCHSHPR-CCAFASHMPYVL-LSSSVSFYLP LLV 

GT-R-AV— IAGIKLVALALAFPOjCFYSTITT DEGATKCWAWPEDSGGKMLLLYHLIVIALIYF-LP LW 

KKAGMM — I A AAWVLS F ILWAPA I LF-HQF I VGVRTVEDGE CYIQFFS KAAVTFGT-AI AAFYLP VII 

R— RVTVMIAI VWVLSFTISCPLLFGLNMTOQHE CI IAI1PAFWY SSIVSFYVPFIVTLLV 
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HUKADB3 
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HUMACHRM2 

RATDOP2 



CITYYRIFTCI ARDQ AKRIHHKGSWKAATIG EHKAT 

VFVYSRVFQVEGRFHSPKL--AKRQLQKIDKSGQVEQDGRSCHGLRRSSKFCLK KHKAL 

LFVYARVFW ATRQLRLLRGELGRFPPEESPAPPSRSLAPAPVGTCAPPEGVPACGRRPARLLPLR--EHRAL 

-FVAYSVIGLTLWRR SVPGHQAHGANLRHLQAKKKFV 

T-VLYWHISV — 137 aa— ARKIVKKTKQPAKKKPPPSR EKKVT 

IKIYIVLRKRRKRVNTKRSSRAFRANLKT 72 a a FFEIQTMPNGKTRTSLKTMSRRKL-SQQKEKKAT 



VI 



VII 



CANH2 - TLAAVHGAFI ICWFPYFTVFVYRGLKGDDAIKE-AFEAV — VL-WI/3YANSALNPILYATLNRDFRTAY0-QL-FRCRPASHNA 

HAMAD RB2 - TLGIIKOTFTLCWLPFFIVNIVHVIQDNLIPKEVYI L-LNWIXJYVNSAFNPLI YCRS-PDFRIAFQELL CLRRSSSK 

HUMADB3 - TLGLIHGTFTLCWI^FFLAHVLRALGGPSLVPGPAF UU^OTOXSYAHSAFHPLIYCRS-PDFRSAFRRLL-CRCGRBLPPE 

BOVSUBK - TMVLVWTFAICWLP Y HLY F I LGTFQ- ED I YCHKF IQQVYLALFWLAMSSTMYNP 1 1 YCCLHHRFRSG FR — LAFRCCPWVTPT 

HUMACHRM2 - TILAILLAFI ITWAPYH- VMVLINTFCAPCIPNTVWTIGY HLCYIHSTIHPACYALCHATFKKTFKHLLM — CHYKNIGA 

RATDOP2 - KLAIVLGVFIICWLPFFITHILNIHCDCNIHQSSTAPSH GWAMSTVPSTPSSTPPSTSSSARPS 



CANH2 - ET SLRS KS SQLA-RNQS REPMRQEEKPLK-LQVWSGTEVTA PRGATDR 

HAMAD RB 2 - YCNGY SSNSNGKTDYMGEASGCQIXSO^EKESERLCEDPPGTESFVNCQGTVPSLSLOSQGRKCSTNDSPL 

HUMADB3 - CAAARPALFPSGVPAARSSPAQPRLCQRLDG 

BOVSUBK - EDKMELTYTPSLSTRVWRCHTKEIFFMSGDVAPSEAVNGQAESPQAGVSTEP 

Fig. 6. Structural comparison of the putative histamine H2 receptor with other G protein-linked receptors. The deduced amino acid 
sequences of the receptors (indicated by the conventional single-letter abbreviations) are aligned on the basis of homologous regions, which are 
shown by boldface letters. The roman numerals indicate the putative transmembrane domains. CANH2, canine H2 receptor: HAMADB2, 
hamster ^-adrenergic receptor (15); HUMADI33, human ^-adrenergic receptor (16); BOVSUBK, bovine substance K receptor (17); 
HUMACHRM2, human M 2 -muscarinic receptor (18); RATDOP2, rat dopamine D2 receptor (19). 

though the approach that we utilized to obtain this clone was 
nonspecific, we purposely targeted a particular tissue known 
to contain certain G protein-linked receptors of interest, 
including those for histamine and gastrin. The full-length 
clone obtained was initially for a receptor specific for an 
unknown ligand; however, comparison of the deduced amino 
acid sequence to that of other G protein-linked receptors with 
presumed seven-transmembrane motifs revealed extensive 
homology (Fig. 6). Like the genes encoding many of the other 
members of this family, our gene appeared to be devoid of 
introns as well (20). Several features of the amino acid 
sequence deduced from our gene were notable and provided 
clues as to its identity. The first clue was the aspartic acid 
residue in the third transmembrane domain. An aspartic acid 
in this position has been shown by mutational analysis to be 
important for ligand binding to the /3-adrenergic receptor, 
which is also a member of this receptor family (Fig. 7 A). It is 
hypothesized that the carboxyl group of the aspartic acid 
moiety acts as a counter anion to the cationic amino group of 
^-adrenergic agonists (21). Indeed, receptors for a number of 
cationic biogenic agonists such as dopamine and acetylcho- 
line are also characterized by the presence of this aspartic 
acid residue, while receptors for other ligand s such as peptide 
hormones are not. The second structural feature of note was 
the absence of the two serine residues present in the fifth 
transmembrane region of receptors for catecholamines and 
dopamine as highlighted in Fig. IB. This information sug- 
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- CNIYT-SU>-VKDC-TA SUNLEMISIDRY 

- OGVYL-AID-VLTC-TS SIVHLCAISIDRY 

- (XIJWr-SVD^V-ICVTA SIETLCV1ALDRY 
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- CDUVT-LD-VM-C-TA SHNLCAISIDRY 
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- nlvyOj-^glvtiylp — liVMcm 

- NRAYAI-ASSWSFYVP LCB-ffiFVY 

- N^AI-ASSIVSnVP LVWMFVY 

- IMTfVL-LSSSVSFYLP IA.VKUEVY 

- EPFYAlr-FSSLSSFYIPIAV-ILVMiC 
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- EKaYHICVTVLTYF-LP UN — IGY 

- NAAVITCTAIAA-FYLP VIIMT-VL 

- DO^VIIFIAII£F-IAffTPIMLVSSTIL 



Fig. 7. Structural comparisons of the third (A) and fifth (B) 
transmembrane domains of the canine H2 receptor (CANH2) with 
those of other G protein-linked receptors: HUMADA2 (26), HU- 
MADB1 (27), and HUMADB3, human a r , ft- and fr-adrenergic 
receptors; HAMADRA1 and HAMADRB2, hamster ay and 0 2 - 
adrenergic receptors; HUMACHRM2, human M2-muscarinic recep- 
tor; RATDOP2, rat dopamine D 2 receptor; RATSUBP (28, 29), rat 
substance P receptor; BOVSUBK, bovine substance K receptor; 
MAS, product of mas oncogene (30). 
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gested that our clone encoded a novel class of receptor. 
However, the conservative substitution of a threonine resi- 
due and an aspartic residue for the two serine residues was 
of particular interest in view of the data suggesting that the 
serines are sites of hydrogen bonding to the hydroxy I groups 
present in the catechol ring of adrenergic agonists (22). A 
third structural feature of interest (Fig. 6) was the homology 
of the carboxyl- and amino-terminal ends of the third cyto- 
plasmic loop (between the fifth and sixth transmembrane 
regions) with comparable regions of the /^-adrenergic recep- 
tor, which have been shown previously to be of critical 
importance to its linkage to the G protein associated with 
adenylate cyclase activation (22, 23). 

This structural information suggested the possibility that 
our clone encoded a receptor for a positively charged bio- 
genic amine linked to adenylate cyclase activation. We 
hypothesized that the most likely such receptor on gastric 
parietal cells would be the H2 subtype of histamine receptor. 
This hypothesis was tested and proven by inserting the 
presumed coding region of the receptor gene into the eukary- 
otic expression vector CMVneo, expressing it in mouse L 
cells, and measuring the changes in cellular cAMP content 
induced by histamine. We characterized further the nature of 
the histamine receptor subtype encoded in our cloned gene 
by demonstrating the specific binding of [methyt- 3 H]- 
tiotidine, a labeled H2-receptor antagonist, to L cells trans- 
formed with the receptor gene. Our data confirmed that our 
clone encoded the H2 subtype of histamine receptor. 

An interesting feature of our cloned gene is the presence of 
an out-of-frame ATG codon 50 bp upstream of the presumed 
initiation codon of the major open reading frame (Fig. 2). A 
similar short open reading frame upstream of the major open 
reading frame has been described previously for the ^-ad- 
renergic receptor, although its significance is yet unknown 
(15, 24). The translation initiation sequence of the major open 
reading frame is more consistent with the consensus eukary- 
otic translation initiation sequence (25), The transcription 
initiation site of our receptor gene has not been determined; 
however, we examined two different receptor gene con- 
structs in L cells, one containing the entire gene sequence as 
described in Fig. 2 and the other lacking the short upstream 
open reading frame. Expression of both of these constructs 
resulted in L cells that exhibited histamine binding and cAMP 
generation in response to histamine (data not shown). While 
we did not compare levels of expression, the upstream 
segment is apparently not essential for histamine receptor 
gene expression. 

As mentioned above, a major difference in the structural 
features of the H2 receptor and that of catecholamine recep- 
tors is the absence of the two serine residues in the fifth 
transmembrane domain. However, with the knowledge that 
the natural ligand for the former receptor is an imidazole, it 
is possible to speculate on the nature of the ligand-receptor 
interaction. The aspartic and threonine residues that have 
substituted for the serine moieties have the ability to interact 
via hydrogen bonds with the nitrogen moieties on the imida- 
zole ring of histamine. Future mutational analysis of this site 
will be required to substantiate the validity of this model. 
Nonetheless, through modeling and analysis it may be pos- 
sible to define the nature of histamine binding and, perhaps 
more importantly from a therapeutic standpoint, inhibition of 
histamine binding. 

By taking advantage of the marked homology between 
receptors linked to G proteins, we have been successful in 
cloning a gene encoding the H2 subtype of histamine recep- 
tors despite starting without even rudimentary knowledge of 
the biochemistry of this receptor. If there were substantial 
homology among the histamine receptor subtypes as there is, 
for example, among the catecholamine receptor subtypes, it 
might be possible to extend these findings on the H2 receptor 
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ultimately to structural information on the HI and H3 recep- 
tors through cloning of their genes as well. 
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ABSTRACT 

Histamine regulates neurotransmitter release in the central and 
peripheral nervous systems through H 3 presynaptic receptors. 
The existence of the histamine H 3 receptor was demonstrated 
pharmacologically 15 years ago, yet despite intensive efforts, 
its molecular identity has remained elusive. As part of a directed 
effort to discover novel G protein-coupled receptors through 
homology searching of expressed sequence tag databases, we 
identified a partial clone (GPCR97) that had significant homol- 
ogy to biogenic amine receptors. The GPCR97 clone was used 
to probe a human thalamus library, which resulted in the iso- 
lation of a full-length clone encoding a putative G protein- 
coupled receptor. Homology analysis showed the highest sim- 
ilarity to M2 muscarinic acetylcholine receptors and overall low 
homology to all other biogenic amine receptors. Transfection of 



GPCR97 into a variety of cell lines conferred an ability to inhibit 
forskol in-stimulated cAMP formation in response to histamine, 
but not to acetylcholine or any other biogenic amine. Subse- 
quent analysis revealed a pharmacological profile practically 
indistinguishable from that for the histamine H 3 receptor. In situ 
hybridization in rat brain revealed high levels of mRNA in all 
neuronal systems (such as the cerebral cortex, the thalamus, 
and the caudate nucleus) previously associated with H 3 recep- 
tor function. Its widespread and abundant neuronal expression 
in the brain highlights the significance of histamine as a general 
neurotransmitter modulator. The availability of the human H 3 
receptor cDNA should greatly aid in the development of chem- 
ical and biological reagents, allowing a greater appreciation of 
the role of histamine in brain function. 



Since its first pharmacological description as an endoge- 
nous substance in 1910 (Barger and Dale, 1910), histamine 
has proven to exert tremendous influence over a variety of 
physiological processes. Most notable are its roles in the 
inflammatory "triple response" and in gastric acid secretion, 
which are mediated by H x (Ash and Schild, 1966) and H 2 
(Black et al., 1972) receptors, respectively. In the early 1970s 
emerged an understanding that histamine is a neurotrans- 
mitter in the central nervous system (Schwartz et al., 1970; 
Baudry et al., 1975). In 1983, a third subtype of histamine 
receptor, H 3 , was identified as a presynaptic autoreceptor on 
histamine neurons in the brain controlling the stimulated 
release of histamine (Arrang et al., 1983). Subsequently, the 
H 3 receptor has been shown to be a presynaptic heterorecep- 
tor in nonhistamine-containing neurons in both the central 
and peripheral nervous systems (for review, see Hill et al., 
1997). Through the molecular cloning of H x and H 2 , these 
receptors were proven to belong to the superfamily of G 
protein-coupled receptors (GPCRs; Gantz et al., 1991; Ya- 



mashita et al., 1991). For the past 10 years, the histamine H 3 
receptor has been the target of numerous cloning and puri- 
fication attempts, yet its molecular identity has remained an 
enigma. 

We have initiated an effort to identify and clone orphan 
GPCRs as a means to identify novel drug targets and as a 
way to discover novel neurotransmitters and peptides. This 
is an approach used by many investigators, and it has led to 
the successful identification of ligands such as nociceptin 
(Reinscheid et al., 1995), prolactin- releasing factor (Hinuma 
et al., 1998), the orexins (Sakurai et al., 1998), and, more 
recently, apelin (Tatemoto et al., 1998). There are at least 70 
orphan GPCRs in the public domain. We have identified, 
through searching public and private databases, at least 30 
additional putative members of this family via expressed 
sequence tags (ESTs). One of these orphan receptors, our 
designation GPCR97, was expressed abundantly in the cen- 
tral nervous system, and its 5' -most sequence shares signif- 
icant homology with the putative transmembrane domain 



ABBREVIATIONS: GPCR, G protein-coupled receptor EST, expressed sequence tag; cAMP, cyclic AMP; PCR, polymerase chain reaction. 
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VII of several members of the biogenic amine family of re- 
ceptors. Therefore, we investigated the possibility that the 
GPCR97 cDNA encodes a novel neurotransmitter receptor. 

Experimental Procedures 

Materials. Human mRNA and all Northern blots were purchased 
from Clontech (Palo Alto, CA). cDNA synthesis kits were purchased 
from Gibco Life Technologies (Gaithersburg, MD). Gelzyme was ob- 
tained from Invitrogen (San Diego, CA), and pCIneo vector was 
obtained from Promega (Madison, WI). All cell lines were obtained 
from American Type Culture Collection (Manassas, VA). Cyclic AMP 
(cAMP) Flashplates were obtained from DuPont/New England Nu- 
clear (Boston, MA). Fluo-3 was purchased from TEF Laboratories 
(Austin, TX) G418 was purchased from Calbiochem (San Diego, CA). 
All histamine ligands were purchased from Research Biochemicals, 
Inc. (Natick, MA). All other reagents were purchased from Sigma 
Chemical Co. (St. Louis, MO). 

Cloning of GPCR97 cDNA. A human thalamus cDNA library 
was constructed from poly(A) 4 '-selected RNA as described by the 
manufacturer (Gibco Life Technologies). Double-stranded DNA was 
digested with Notl and then run on a 0.8% low-melting agarose gel, 
and cDNA in the range of 2.5 to 5 kilobases (kb) was excised, purified 
with Gelzyme, and subsequently was subcloned into pSport vector. 
The size-selected human thalamus cDNA library was screened with 
a radiolabeled fragment of the GPCR97 EST clone. A full-length 
GPCR97 was obtained and, subsequently, cloned into the mamma- 
lian expression vector pCIneo (Promega) and transfected into human 
embryonic kidney 293 cells, rat C6 glioma cells, and human SK- 
N-MC neuroblastoma cells. 

Transfection of Cells with GPCR97 cDNA Cells were grown 
to about 70% to 80% confluence and then removed from the plate 
with trypsin and pelleted in a clinical centrifuge. The pellet was then 
resuspended in 400 pi of complete media and transferred to an 
electroporation cuvette with a 0.4-cm gap between the electrodes (no. 
165-2088; Bio-Rad Laboratories, Hercules, CA). One microgram of 
supercoiled DNA was added to the cells and mixed. The voltage for 
the electroporation was set at 0.25 kV and the capacitance was set at 
960 /iF. After electroporation, the cells were diluted into 10 ml of 
complete media and were plated onto four 10-cm dishes at the fol- 
lowing ratios: 1:20, 1:10, 1:5, and the remaining cells. The cells were 
allowed to recover for 24 h before the addition of G-418. Colonies that 
survived selection were grown and tested. Several different cell lines 
were used for transfection, which served two purposes. First, because 
single-cell cloning can often uncover endogenously expressed recep- 
tors (unpublished observations), it is imperative to see the desired 
function in multiple transfections in different cell lines. Second, each 
cell line has a unique characteristic that can be used to enhance 
different aspects of the study. For example, C6 cells grow very fast 
and are easy to culture and, thus, are good for generating lots of 
membranes for binding. SK-N-MC cells give robust cAMP accumu- 
lation and give efficient coupling for inhibition of adenylate cyclase. 
L cells consistently transfect well and have few endogenous recep- 
tors, and, thus, are good for reliable initial characterization of re- 
combinant receptors. It should be noted that inhibition of adenylate 
cyclase and [ 3 Hli?-ot-methylhistamine binding were observed in all of 
the GPCR97-transfected cells. Only the best responding cell lines 
were used for further study. 

cAMP Accumulation. Transfected cells were plated on 96-well 
plates. Overnight cultures were then incubated with Dulbecco's mod- 
ified Eagle's medium-F12 media containing isobutylmethylxan thine 
(2 mM) for 20 min, treated with agonists, antagonists, or both for 5 
min, and then treated with forskolin (10 /iM) for 20 min. The reaction 
was stopped with 1/5 volume 0.5 N HC1. Cell media were then tested 
for cAMP concentration by radioimmunoassay with cAMP Flash- 
plates. 

Calcium Mobilization. Transfected cells were plated on black 
96-well plates with clear bottoms. Overnight cultures were then 



incubated with Dulbecco's modified Eagle's medium-Fl2 media con- 
taining the fluorescent calcium indicator fluo-3 (4 /iM) and probeni- 
cid (2 mM) for 60 min. Ligand -induced fluorescence was then mea- 
sured on a Fluorometric Imaging Plate Reader (FLIPR; Molecular 
Devices, Sunnyvale, CA). 

^a-MethylpHJhistamine Binding. Cell pellets from GPCR97- 
expressing C6 cells were homogenized in 20 mM Tris-HCl/0.5 mM 
EDTA Supernatants from a 800^ spin were collected and recentri- 
fuged at 30,000g for 30 min. Pellets were rehomogenized in 50 mM 
Tris/5 mM EDTA (pH 7.4). Membranes were incubated with 0.4 nM 
i?-or-methyl[ 3 H] histamine plus/minus test compounds for 45 min at 
25°C and harvested by rapid filtration over GF/C glass fiber filters 
(pretreated with 0.3% polyethylenimine), followed by four washes 
with ice-cold buffer. Nonspecific binding was defined with 10 /iM 
histamine. pK t values were calculated based on a K d of 150 pM and 
a ligand concentration of 400 pM (Cheng and Prusoff, 1973). 

In Situ Hybridization. Three adult male Sprague-Dawley rats 
were perfused with 4% paraformaldehyde in 0.1 M borate buffer 
fixative, and their brain tissues were postfixed overnight in fixative 
with 10% sucrose and frozen in dry ice. Five l-in-5 series of 30-/zm- 
thick coronal sections of the whole brain were cut on a sliding 
microtome and mounted onto glass slides. In situ hybridization was 
performed with ^S-riboprobes on this tissue by an adapted protocol 
(Simmons et al., 1989). Then the tissue samples were put on X-ray 
film for 1 day, after which they were dipped in NBT2 nuclear emul- 
sion (Eastman Kodak Co., Rochester, NY), and kept desiccated in the 
dark at 4°C for 6 days. Slides were developed, were Nissl stained, 
and were studied under the microscope to identify structures labeled 
with the GPCR97 cRNA probe. 

RNA Probes. The cRNA probe was constructed from a partial rat 
GPCR97 cDNA clone originally identified by polymerase chain reac- 
tion (PCR) amplification from rat brain cDNA with primers designed 
against the human receptor (5' primer, 5 '-AGTCGGATCCAGCTAC- 
GACCGCTTCCTGTC-3 ' ; 3' primer, 5 ' - AGTC AAGCTTGGAGC- 
CCCTCTTGAGTGAGC-3 ' ). The resulting £07-base pair (bp) frag- 
ment was ligated into pBluescript (Stratagene, La Jolla, CA). ^S- 
UTP-labeled antisense and sense probes for rat GPCR97 were 
synthesized after linearization withBamHI or Hindlll with T7 or T3 
RNA polymerase, respectively. The labeled sense strands served as 
controls and did not show any specific labeling of cellular localization 
(data not shown). Specific activities of ^S-UTP probes were approx- 
imately 2 to 3 X 10 6 counts per minute//ig. All restriction enzymes 
and phage RNA polymerases were obtained from Boehringer Mann- 
heim (Indianapolis, IN). 

Northern Blot Analysis. Northern blots obtained from Clontech 
(Palo Alto, CA) were hybridized with o> 32 P-dCTP-labeled (Amer- 
sham Pharmacia Biotech, Piscataway, NJ) human GPCR97 cDNA as 
described by the manufacturer (Expresshyb, Clontech). Two million 
counts per milliliter was used in a total volume of 10 ml of hybrid- 
ization buffer and incubated at 68"C for 2 h. The blot was then 
washed two times at RT in 2 X standard saline citrate and 0.05% 
SDS for 30 min each. It was further washed two more times for 30 
min each at 60"C and exposed overnight to film. 

Results 

Cloning and Sequence Analysis of GPCR97 cDNA. 

GPCR97 was initially identified as an EST in a basic local 
alignment search tool (Altschul et al., 1990) search of the Life 
Seq database (Incyte Pharmaceuticals, Palo Alto, CA) with 
the a 2 ' a drenergic receptor sequence as a query. The 5' end of 
the GPCR97 EST had approximately 35% homology to the 
seventh transmembrane domain of the a2-adrenergic recep- 
tor. Semiquantitative PCR of GPCR97 with cDNA templates 
from a variety of human tissues showed expression predom- 
inantly in the central nervous system, with the greatest 
intensity in the thalamus. Therefore, we constructed a size- 
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selected human thalamus cDNA library and screened it with region with low homology (20-27%) to the biogenic amine 
the original EST fragment as a labeled probe. From this subfamily of GPCRs. Most notable was an aspartic acid res- 
screen, a full-length 2.7-kb clone consisting of a 298-bp 5'- idue in the putative transmembrane domain III, the putative 
untranslated region, a 1335-bp open reading frame, and a binding site for the primary amine, which is a clear hallmark 
1100-bp 3 '-untranslated region was obtained. Translation of of the biogenic amine receptor subfamily (Fig. 1). This con- 
the open reading frame revealed a 445-amino acid coding served aspartic acid residue is shown in the alignment of the 



TM1 

HI MSLPN SSC LLEDKMCEGNKTTMAS PQLMPLVWIj S T I CLVTVGLNLLVL YAVR 

H2 MAPNG TASSFCL-DSTAC — K ITITVVIiAVLILITVAGNVVVCLAVG 

GPCR9 7 ME RAP PDGPLNASGALAGDAAAAGGARGFSAAWTAVLAALMALIi I VATVLGN AL VM L AFV 

* * ** * * * 

TM2 

HI SERKLHTVGNLYIVSLSVADLIVGAVVMPMNILYLLMSKWSLGRPLCIiPWLSMDYVASTA 
H2 LNRRLRNLTNCFIVSLAITDLLLGLLVLPFSAIYQLSCKWSPGKVFCNIYTSLDVMLCTA 
GPCR97 ADSSLRTQNNFFLLNLAISDFLVGAFCIPLYVPYVLTGRWTFGRGLCKLWLVVDYLLCTS 
* * * * * * ***** * * 

TM3 

HI SIFSVFILCIDRYRSVQQPLRYLKYRTKTR-ASATILGAWFLSFLWVIP — ILGWNHFMQ 

H2 SILNLFMISLDRYCAVMDPLRYPVLVTPVR-VAISLVLIWVISITLSFLSIHLGWNSRNE 
GPCR97 SAFNIVLISYDRFLSVTRAVSYRAQQGDTRRAVRKMLLVWVLAFLLYGP-AILSWEYLSG 
* ***** * * * 

TM5 

HI QTSVRRED-KCETDFYDVTWFKVMTAIINFYLPTLLMLWFYAKIYKAVRQHCQHRELINR 

H2 TSKGNHTTSKCKVQVNEV — YGLVDGLVTFYLPLLIMCITYYRIFKVARDQAKR INH 

GPCR97 GSS IPEGH- - C YAEFFYNWYFLITASTLEFFTPFLSVTFFNLS I YLNI - -Q-RRTRLRLD 

* * * * * 

HI SLPSFSEIKLRPENPKGDAKKPGKESPWEVLKRKPKDAGGGSVLKSPSQTPKEMKSPWF 

H2 ISSWKAATIREH 

GPCR97 GAREAAGPEPPPEAQPSPPPPPGCWGCWQKGHGEAMPLHRYGVGEAAVGAEAGEATLGGG 

* 

HI SQEDDREVDKLYCFPLDIVHMQAAAEGSSRDYVAVNRSHGQLKTDEQGLNTHGASEISED 

H2 

GPCR97 GGGGSVASPTSSSGSSSRGTERPRSLKRGSKPSASSASLEKRMKMVSQSFTQRFRLSRDR 

HI QMLGDSQSFSRTDSDTTTETAPGKGKLRSGSNTGLDYIKFTWKRLRSHSRQYVSGLHMNR 

H2 

GPCR97 

TM6~" 

HI ERKAAKQLGFIMAAFILCWI PYFIFFMVI AFCKNCCNEHL 

H2 — KATVTLAAVMGAFIICWFPYFTAFVYRGLRGDDAINEVLEAIVNASQLSRTQSREPRQ 

GPCR9 7 - - KVAKSLAVIVSIFGLCWAP YTLLMI IRAACHGHCVPDYW 

* * ****** 

TM7 

HI HMFTIWLGYINSTLNPLIYPLCNENFKKTFKRILHIRS 

H2 QEEKPLKLQVWSGTEVTAPQGATDRLWLGYANSALNPILYAALNRDFRTGYQQLFCCRL 

GPCR97 YETSFWLLWANSAVNPVLYPLCHHSFRRAFTKLLCPQK 

** ***** * 

Hl 

H2 ANRNSHKTSLRS - - 

GPCR9 7 LKIQPHSSLEHCWK 



Fig. 1. Amino acid sequence of human GPCR97 receptor compared with the human histamine and Ha receptors. Putative transmembrane domains 
are stated above the sequence and indicated by a solid line. Residues that are identical among all three receptors are indicated by an * below the 
sequence. DNA and protein sequences have been deposited with GenBank (accession no. AF140538) 
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predicted amino acid sequence of GPCR97 with the human 
histamine Hj and H 2 receptors. Overall homology between 
GPCR97 and the Hj and H 2 receptors is 22% and 21.4%, 
respectively. 

GPCR97-Expressing Cells Inhibit Adenylate Cyclase 
in Response to Histamine. Given the homology of GPCR97 
to the biogenic amine family, we first tested its ability to 
respond to several of the amine neurotransmitters, measur- 
ing either the stimulation of calcium mobilization or the 
increase or decrease of cAMP accumulation in mouse L cells. 
The biogenic amine ligands tested (acetylcholine, dopamine, 
imidazole, epinephrine, tryptamine, serotonin, and hista- 
mine) were negative for an increase in both calcium mobili- 
zation or in cAMP accumulation (not shown). However, after 
forskolin stimulation of basal cAMP accumulation, there was 
a selective and marked inhibition of adenyate cyclase in 
response to histamine in the transfected cell line but not in 
the nontransfected cell line (Fig. 2). This effect was mimicked 
by the high-affinity H 3 agonist i2-a-methylhistamine, which 
has an EC 50 of 1 nM (Fig. 3). In addition, the effect of 
i?-a-methylhistamine could be blocked by the known selec- 
tive H 3 antagonists thioperamide and clobenpropit (Fig. 3) 
but not by the K ± antagonist diphenhydramine (Fig. 3) or the 
H 2 antagonist ranitidine (not shown). 

GPCR97-Expressing Cells Bind the High-Affinity 
Histamine H 3 Ligand ii-a-MethylpHlhistamine. To con- 
firm the H 3 pharmacology, we examined whether the 
GPCR97-transfected cells could bind the H 3 ligand R-a- 
methyl[ 3 H] histamine. For these studies, we transfected a 
different cell line (C6 glioma cells) because of its of ability to 
grow fast. C6 cells transfected with GPCR97 were able to 
bind pHJii-a-methylhistamine with high affinity (Fig. 4, in- 
set), whereas untransfected cells had no demonstrable bind- 
ing (not shown). In addition, the known H 3 agonists (hista- 
mine, imetit, and N-methylhistame) and antagonists 
(thioperamide and clobenpropit) could all compete for bind- 
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Fig. 2. Inhibition of cAMP accumulation in response to the various amine 
transmitters. Cells were treated with 10 jiM forskolin 5 min after the 
addition of compounds (1 fiM) and incubated for an additional 20 min. All 
values were determined in duplicate. Error bars represent S.E.M. 



ing (Fig. 4) with a rank order of potency consistent with that 
described for the histamine H 3 receptor (Table 1). 

GPCR97 is Expressed Abundantly in the Central 
Nervous System. Because the pharmacological profile of 
GPCR97 was consistent with the mstamine H 3 receptor, we 
investigated the mRNA distribution and compared it to the 
known distribution of H 3 binding sites. Northern blots of 
human mRNA showed expression only in the brain, most 
notably in the thalamus and the caudate nucleus (Fig. 5). 
Little expression was observed in any peripheral tissue ex- 
amined (heart, placenta, lung, liver, skeletal muscle, kidney, 
pancreas, spleen, thymus, prostate, testis, ovaries, small in- 
testine, colon, stomach, thyroid, lymph node, trachea, and 
bone marrow; data not shown). To obtain a rat homolog of the 
GPCR97 cDNA, we used oligonucleotide primers designed 
from the human sequence to amplify a cDNA fragment from 
RNA extracted from rat brain. This rat cDNA probe (which 
has 85% nucleotide identity to human GPCR97) was subse- 
quently used to examine the tissue distribution of GPCR97- 
encoded mRNA by in situ hybridization in rat brain sections. 
GPCR97 mRNA is abundantly expressed in rat brain and is 
most notably observed throughout the thalamus, the ventro- 
medial hypothalamus, and the caudate nucleus (Fig. 6, A and 
B). Strong expression was also seen in layers II, V, and VTb of 
the cerebral cortex, in the pyramidal layers (CA1 and CA2) of 
the hippocampus, and in olfactory tubercle (Fig. 6, A and B). 
Because the H 3 receptor functions as an inhibitory presyn- 
aptic receptor, it is expected that the mRNA localization may 
not exactly match the functional receptor localization, de- 
pending on the axonal length of the neuron expressing it. For 
example, noradrenergic cells in the locus ceruleus project to 
all areas of the cerebral cortex where histamine, via H 3 
receptors, is known to regulate noradrenaline release 
(Schlicker et aL, 1989; Smits and Mulder, 1991). Therefore, it 
was predicted and confirmed that the mRNA for GPCR97 
was expressed in the locus ceruleus (Fig. 6, C and E). In 
addition, because the H 3 receptor has also been functionally 
demonstrated on the histamine terminals in the cerebral 
cortex (Arrang et al., 1983), its mRNA must also be located in 
the histaminergic cell bodies in the tuberomammillary nu- 
clei. This was also confirmed for GPCR97 (Fig. 6D). 



400n 




Log [R-a-methylhistamine] (M) 

Fig. 3. Inhibition of cAMP accumulation in response to the agonist 
i?-a-methylhistamine. Cells were treated with 10 /tM forskolin 5 min 
after the addition of i?-a-methylhistomine and incubated for an addi- 
tional 20 min. Where indicated, antagonists (1 jlM) were incubated 5 min 
before the addition of the agonist alone (■), with diphenhydramine ( ♦ ), 
with thioperamide (▲), or with clobenpropit (•). All values are deter- 
mined in triplicate. Error bars represent S.E.M. 
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There are numerous reports of presynaptic H 3 receptors in 
the autonomic nervous system controlling neurotransmitter 
release in the heart, the lung, and the gastrointestinal tract 
(Arrang et al., 1988; Molderings et al., 1992; Bertaccini and 
Coruzzi, 1995; Imamura et al., 1995; Stark et al., 1996a). 
GPCR97 mRNA was detected by PGR amplification in RNA 
extracted from human small intestine, testis, and prostate 
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Fig. 4. Top, saturation isotherm and Scatchard transformation (inset) of 
fl-a-methylpH]histamine to GPCR97-transfected C6 cells. Total binding 
(■), nonspecific binding (A), and specific binding (O) are shown. Bottom, 
competition binding of [ 3 H]^-a-methylhistamine (0.4 nM) in the presence 
of various concentrations of H 3 agonists and antagonists. K D was calcu- 
lated as - 1/slope from the linear Scatchard transformation. pIC^ values 
were determined by a single site curve fitting program (Prism; GraphPad 
Software, San Diego, CA) and converted to pK, values according to Cheng 
and Prusoff(1973). 

TABLE 1 

pKj values of known histamine agonists and antagonists 



Compound 



pK, 



Af-methy lhis tami ne 


-9.8 


Imetit 


-9.7 


Immepip 


-9.7 


Ciobenpropit 


-9.3 


Histamine 


-8.5 


Thioperamide 


-7.7 


Ranitidine 


>-5 


Diphenhydramine 


>-5 


Clozapine 


>-5 


Cirazoline 


>-5 


Mepyramine 


>-5 


Imidazole 


>-5 



tissues, but was riot detected in these tissues by Northern 
blot analysis (not shown). If GPCR97 was only expressed in 
the neuronal plexus, its overall low abundance in a whole 
tissue preparation could account for this discrepancy. We are 
currently investigating via in situ hybridization whether the 
GPCR97 receptor mRNA is produced in the ganglia of the 
autonomic and enteric nervous systems. An alternative ex- 
planation for the absence of clear peripheral expression could 
be the existence of additional subtypes of the H 3 receptor, 
which previously has been suggested based on pharmacolog- 
ical evidence (West et al., 1990; Raible et al., 1994; Leurs et 
al., 1996; Schlicker et al., 1996). 

Discussion 

The present data describes the cloning and characteriza- 
tion of a novel GPCR, GPCR97, with a pharmacology and a 
tissue distribution that is consistent with the histamine H 3 
receptor subtype. We found that cells transfected with 
GPCR97 were able to inhibit adenylate cyclase in response to 
histamine. Because the two known cloned histamine recep- 
tors, Hj and H 2 , activate phosphoinositide hydrolysis and 
stimulation of adenylate cyclase, respectively, the inhibition 
of adenylate cyclase that we observed is a new finding for a 
cloned histamine receptor. It should be noted that previous 
experiments with pertussis toxin- and histamine-stimulated 
35 S-GTPyS binding have suggested that the H 3 receptor 
might be Gj-linked (Clark et al., 1993; Laitinen and Jokinen, 
1998). Because the putative H 3 histamine receptor has been 
pharmacologically defined (Arrang et al., 1987; Leurs et al., 
1998), we were able to test known selective agonists and 
antagonists. The selective H 3 agonist i?-a-metJiylhis taurine 
was able to potently and dose-dependently inhibit forskolin- 
stimulated adenylate cyclase, an effect that was mimicked by 
two additional H 3 agonists, imitet and iV-a-methylhistamine 
(data not shown). In addition, the effect of i2-a-methylhista- 
mine was blocked by the selective H 3 antagonists thioperam- 

1 2 3 4 5 6 7 



9.5 
7.5 





Values were determined by competition binding with J?-a-methylI 3 H]histamine to 
GPCR97-expressing cell membranes. 
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Fig. 5. Northern blot analysis of human brain mRNA samples (5 ug of 
polyCA)*" RN A/lane). Lane 1, amygdala. Lane 2, caudate. Lane 3, corpus 
callosum; Lane 4, hippocampus. Lane 5, whole brain. Lane 6, substantia 
nigra. Lane 7, thalamus. The probe was the full-length GPCR97 coding 
sequence. Exposure time to film was 3 days (-80°C). 
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Fig. 6. Distribution of GPCR97 mRNA in rat brain. Representative film 
autoradiograms of coronal sections arranged rostral to caudal (A-C) and 
darkfield photomicrographs of coronal brain sections showing GPCR97 
mRNA in the ventral portion of the tuberomammillary nucleus (D), and 
in the locus ceruleus (E). Magnification, D - 100 x and E = 40x. Abbre- 
viations: CA1, CA2, pyramidal layers of the hippocampus; CP, caudopu- 
tamen; Cx, cortex; EPd, end op inform nucleus, dorsal part; LC, locus 
ceruleus; OT, olfactory tubercle; Th, thalamus; TTMv, tuberomammillary 
nucleus, ventral portion; VMH, ventromedial hypothalamus. 



ide and clobenpropit but not by the H x or H 2 antagonists 
diphenhydramine or ranitidine. GPCR97-transfected cells 
also bound the high-affinity H 3 agonist i?-«-methyl pH] histamine. 
All of the tested H 3 agonists and antagonist could compete for 
specific i?-a-methyl[ 3 H] histamine binding with similar po- 
tencies to those reported for these compounds to brain mem- 
branes (Hill et al., 1997). It has been suggested that clozapine 
may impart some of its antipsychotic effects in humans 
through H 3 receptor antagonism (Kathmann et al., 1994; 
Rodrigues et al., 1995; Stark et al., 1996b). We found that 
clozapine did not significantly compete for binding to the 
recombinant human receptor (Table 1). These differences in 
pharmacology may be because of species differences or 
possible H 3 heterogeneity (West et al., 1990). 

One of the most striking features of this receptor is the 
abundant expression in the central nervous system, particu- 
larly in the caudate, the thalamus, and the cortex. Thus, it is 
surprising that this receptor cDNA has eluded so many clon- 
ing attempts over the years. To explain the previous unsuc- 
cessful attempts to clone the H 3 receptor, we compared the 
sequence of GPCR97 to that of the H 1 and H 2 receptors (Fig. 
1). The low overall homology among these three receptors 
suggests, in retrospect, that low-stringency hybridization ap- 
proaches or degenerate PCR would not have been fruitful. In 
addition, we searched the public EST databases with the 
entire H 3 receptor mRNA sequence. We found that the H 3 
receptor exists in the public domain in several clones derived 
from human brain libraries. However, all of these clones 
primarily contain only a 3 '-untranslated sequence, suggest- 
ing that there may be some secondary structure present that 
prevents a full-length H 3 encoding mRNA from being effi- 
ciently copied by reverse transcription. Our success in 
screening the human thalamus may be due to its abundance 
in that specific brain region, coupled with the fact that we 
size-selected for mRNAs greater than 2.5 kb. 

There are many questions that remain to be answered 
about the histamine H 3 receptor that we can now begin to 
answer with the cDNA. For example, are there additional H 3 
receptor subtypes? What additional neurotransmitter sys- 
tems are regulated by histamine H 3 receptors? Are H 3 recep- 
tors expressed on nonneuronal cells in the periphery? We are 
currently seeking to answer some of these questions. In ad- 
dition, we are inactivating the H 3 receptor gene in mice (i.e., 
knockout mice) to identify its role in central nervous system 
function and memory control and as a means to look for 
additional phenotypes, which may lead to a better under- 
standing of the physiological role of H 3 receptors in normal 
and pathological states. 
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Orphan G protein-coupled 
receptors: a neglected 
opportunity for pioneer 
drug discovery 

Jeffrey M. Stadel. Shelagh Wilson and . 
Derk J. Bergsma 

Access to DNA databases has introduced an exciting 
new dimension to the way biomedical research is 
conducted. 'Genomic research* offers tremendous 
opportunity for accelerating the identification of the 
cause of disease at the molecular level and thereby 
foster the discovery of more selective medicines to 
improve human health and longevity. The current 
challenge is to close the gap rapidly between gene 
identification and clinical development of efficacious 
therapeutics. In the present review, Jeffrey Stadel, 
Shelagh Wilson and Deric Bergsma outline the 
rationale and describe strategies for converting one 
large class of novel genes, orphan G protein-coupled 
receptors (GPCRs), into therapeutic targets. 
Historically, the superfamity of GPCRs has proven to be 
among the most successful drug targets and 
consequently these newly isolated orphan receptors 
have great potential for pioneer drug discovery. 

The advent of rapid DNA sequencing spawned the 
'genomic era', which has led to the initiation of the Human 
Genome Project The novel technologies developed 
in association with genomic research have already had a 
significant impact on the way investigations into the 
basis of disease are being conducted and wilt no doubt, 
substantially enhance the means by which diseases are 
diagnosed and treated in the near future. To keep pace 
with the evolution of molecular medicine, the pharma- 
ceutical industry has embraced genomics and is attempt- 
ing to exploit the new technologies to identify novel tar- 
gets for drug discovery. The major questions that remain 
to be addressed concern how to convert genomic 
sequences into therapeutic targets in an expeditious 
manner and eventually to obtain pharmaceutical drugs 
that will enhance the quality of life. This review will deal 
with a single class of novel molecular targets, focusing 
on the burgeoning collection of G protein- 
coupled receptors (GPCRs) called 'orphan' receptors 1 . 
GPCRs are a superfamily of integral plasma membrane 
proteins involved in a broad array of signalling path- 
ways. Since the first doning of GPCR gene sequences 
over a decade ago, novel members of the GPCR 



superfamily have continued to emerge through cloning 
activities as well as through btoihfbrmatic analyses of 
sequence databases, although their ligands are unidenti- 
fied and their physiological relevance remain to be 
defined. These 'orphan' receptors provide a rich source 
of potential targets for drug discovery. 

The members of the GPCR superfamily are related 
both structurally and functionally. The signature motif 
of these receptors is . seven distinct hydrophobic 
domains, each of which is 20-30 amino acids long and 
which are linked by hydrophilic amino acid sequences of 
varied length 2 * 3 . Biophysical 4 and biochemical 5 studies 
support the notion that these receptors are intercalated 
into the plasma membrane with the amino terminus 
extracellular and the carboxy terminus in the cyto- 
plasmic portion of the cell-Therefore, these receptors are 
often referred to as seven transmembrane (or 7TM) 
receptors. While it is not yet known how many individual 
genes actually encode these receptors, it is clear that this 
family of proteins is one .of the largest yet identified. 
Functionally, GPCRs share in common the property that 
upon agonist binding they transmit signals across the 
plasma membrane through an interaction with hetero- 
trimeric G proteins 6 - 7 . These receptors respond to a vast 
range of agents 2 * 5 * 3 such as protein hormones, 
chemokines, peptides, small biogenic amines, lipid- 
derived messengers, divalent cations (e.g. a Ca 2 * sensor 
has been identified that is a GPCR) 9 and even proteases 
such as thrombin, which activates its receptor by cleav- 
ing off a portion of the amino terminus 10 . Finally, these 
receptors play an important role in sensory perception 
including vision and smell 2 - 5 - 8 . Correlated with the broad 
range of agents that activate these receptors is their exist- 
ence in a wide variety of cells and tissue types, indi- 
cating that they play roles in a diverse range of physio- 
logical processes. It is likely, therefore, that the GPCR 
superfamily is involved in a variety of pathologies. This 
point was recendy emphasized by the surprising discov- 
ery that certain GPCRs for chemokines act as co-factors 
for HIV infection 11 - 13 . 

GPCRs represent the primary mechanism by which 
cells sense alterations in their external environment and 
convey that information to the cells' interior. The binding 
of an agonist to the receptor promotes conformational 
changes in the cytoplasmic domains that lead to the 
interaction of the receptor with its cognate G protein(s). 
Agonist-promoted coupling between receptors and G 
proteins leads to the activation of intracellular effectors 
that substantially amplify the production of second 
messengers feeding into the signalling cascade. Since 
effectors are often enzymes [e.g. adenylate cyclase 14 , 
which converts ATP to cAMP, or phospholipase C 
(Ref . 15), which hydrolyses inositol lipids in membranes 
to release inositol trisphosphate, which in turn mobilizes 
Ca 2 * within a cell] or ion channels 16 , many second 
messenger molecules can be produced as the result of a 
single agonist binding event with its receptor. Changes 
in the intracellular levels of ions or cAMP, or both. 
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fig. 1. Comparison of the protein sequence identity of the orphan APJ" receptor with the angiotensin AT, receptor^ The filled circles indicate amino acid 
identity (29.9%) between the two G proteinHcoupied receptors (GPCRs). This is a typical example of the protein sequence identity shared between orphan 
and known GPCRs. 



result in the modulation of distinct phosphorylation 
cascades 17 ' 18 , extending through the cytosol to the 
nucleus, that eventually culnunate in the physiological 
response of the cell to the extracellular stimulus. 
Although the overall paradigm is apparently the same 
for all GPCRs, the diversity of receptors, G proteins 
and effectors suggest a myriad of potential signalling 
processes and this becomes an important concept as we 
try to identify the function of orphan GPCRs. 

To date, more than 800 GPCRs have actually been 
doned from a variety of eukaryotic species, from fungi to 
humans [see U F. Kolakowski in GCRDb-WWW The G 
Protein-Coupled Receptor DataBase World-Wide-Web 
Site (http://receptor.mgM^ 

html.org)]. For humans, the most represented species, 
about 140 GPCRs have been cloned for which the cog- 
nate ligands are also known. This number excludes the 
sensory olfactory receptors, of which hundreds to thou- 
sands are predicted to exist. By traditional molecular 
genetic approaches, coupled with the explosion in 
genomic information, it has been possible to identify 
more than 100 additional orphan GPCR family members. 
By definition, there is enough sequence information in 
the receptor cDNAs to place them clearly in the super- 
family of GPCRs, but often there is insufficient sequence 
homology with known members of this family to be able 
to assign their ligands with confidence or predict their 
function. In total, there are currently over 240 human 
GPCRs, excluding sensory receptors. As the size of 
sequence databases continues to increase, this list is 
expected to grow to 400, and perhaps even to 1000 or 
more unique gene products. The list wul grow even fur- 
ther as paralogues and alternatively spliced GPCR vari- 
ants emerge. Most orphan GPCRs share a low degree of 



sequence homology (typically about 25-35% overall 
amino acid sequence identity), with known GPCRs, sug- 
gesting that they belong to new subgroups of receptors 
(Fig. I) 19 * 20 . Indeed, several orphan GPCRs show closer 
homology to each other than to known GPCRs. Never- 
theless, the majority of orphan receptors are phylo- 
genetically distributed among a broad spectrum of dis- 
tantly related, known receptor subgroups. 

What is the rationale for investing considerable time 
and resources into trying to establish the function of 
orphan GPCRs? Simply stated, GPCRs have a proven 
history of being excellent therapeutic targets. Within the 
past 20 years, several hundred new drugs have been reg- 
istered that are directed towards activating or antagon- 
izing GPCRs; in fact, it is estimated that most current 
research within the pharmaceutical industry is focused 
on this signalling pathway 21 . Table 1 shows a represen- 
tative snapshot of a variety of receptors, disease targets 
and corresponding drugs. It is dear from this table that 
the therapeutic targets span a wide range of disorders 
and disease states. Another example of the significance 
and versatility of GPCRs is the number of cases of genetic 
diseases that are linked to defects in these proteins; some 
of these diseases are indicated in Table 2 (Refs 22-38). It 
is likely that many more genetic diseases will be mapped 
to GPCRs as the era of genomics continues to expand and 
families with inherited mutations are examined much 
more comprehensively. 

The importance of GPCRs to drug discovery continues 
to be manifested by the fact that across the pharmaceuti- 
cal industry active research projects, ranging from basic 
studies all the way through to advanced development, 
are focused on GPCRs as primary targets. Molecular 
biology has had a dramatic influence on these efforts. 
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Table 1. Examples of marketed drags for G protein-coupled receptors (GPCRs) 


GPCft 


Generic 


Drag 


Indication 


Muscarinic acetylcholine 


8ethanechol 


Urecholine 


Gl Ia • x r*<r- vv. fK/M^ ^J 


.... .... ... ... 


Dicyclomine 


• Bentyf- ' * 


Gl - ° ■ • ■ 




Ipratropium 


Atrovent 


CP spu^y*^'^- 








Adrenoceptor 








Pi ■ ■ 


Atenolol 


Tenormin 


CP * Jj^iix^-T^T^ 


<*2 


Clonidine 


Catapres 


CP * f >\ 




Propranolol 


Inderal 


CP W 


<*l 


Terazosin 


Hytrin . 


CP * 


P 2 


Albuterol 


Ventolin 


CP r- r r±\ ^ 


P/PAl 


Carvedilol 


Coreg 


CP b^ps <> ' ^ 


Ann intpn s in 








AT, 


Losartan 


Cozaar 


CP " ' 




Eprosartan 


Teveten 


CP 


Calcitonin 


Calcitonin 


Calcimar 


rVctpnnnrncic 




eel-Calcitonin 


Elcatonin 


Osteoporosis 


Dopamine 










Metoclopramide 


Reglan 


Gl 


n /n 


Ropinirole 


Re quip 


CNS 




Haloperidol 


Haldol 


CNS 


Gonadatropin-refeasing factor 


Goseretin 


Zoladex 


Cancer 




Narareitn 


Synarel 


Endometriosis 


Histamine 








H, 


Oimenhydrinate 


Oramamine 


CNS 


H 


Torf on a tA\ no 
1 CI 1 CI laUlI rc 


ociuane 




H 2 


Cimetidine 


Tagamet 


Gl 




Ranitidine 


Zantac 


Gl 


Serotonin (S-HT) 








5-HT 10 


Sumatriptan 


Imitrex 


CNS ^ : 0 '* 

/ 




Ritanserin 


Tisertan 


CNS 


S-HT 4 


Cisapride 


Propulsid 


Gl . j ■ ' ' ■ " ■ ' 


S-HT.8 


Trazodone 


Desyrel 


CNS J--' 




Clozapine 


Clozaril 


CNS \: -f ' 


leutotriene 


Pranlukast 


Onon 


CP 




Zafirlukast 


Accolate 


CP 


Opioid 








•C 


Buprenorphine 


Buprenex 


CNo 




Butorphanol 


Stadof 


CNS 




Alfentanil 


Alfenta 


CNS 




Morphine 


Kadian 


CNS 


Oxytocin 




Syntocinon 


Labour 


Prostaglandin 


Epoprostenol 


Rolan 


CP 




Misoprostol 


Cytotec 


Gl 


Somatostatin 


Octreotide 


Sando statin 


Cancer 


Vasopressin 


Desmopressin 




CP/Renal 


CP. carxfiopulmoaary tystem: Gl gastrointestinal xyste" 1 - 
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Ttfale 2. Diseases associated with mutations of G protein-coupled receptors {GPCRs) 


GPCR 


Mutation 


Disease 


Refs 


Rhodopstn 


Missense: Pro23 to His (NT) 
Missense: Val87 to Asp (2TM) 
Missense: Tyr1 76 to Cys (2BJ 
Nonsense: Gln344 to Stop (CT) 


Retinitis pigmentosa 


22. 23 


•* Thyroid stimulating hormone 


Missense: Asp619 to Gly (3IL) 
Missense: AtaB23 to lie (3IL) 


Hyperfunctioning thyroid adenomas 


24 


Luteinizing hormone 


Missense: Asp578 to Gly (6TM) 


Precocious puberty 


25 


Vasopressin V 2 


Missense: Arg137 to His (2IL) 
Missense: Gly185 to Cyc (2EL) 
rrarnesnrrt at Argzou \o \ mj 


X-linked nephrogenic diabetes 


26-28 


Ca 2 * 


Missense. Arg lob to ulu in \ j 

Missense: Glu2S8 to lys (NT) 
Missense: Arg796 to Trp (3IL) 
Missense: Glu128 to Ala (NT) 


Hyperparathyroidism, hypocalciuric 
hypercalcaemia 


Z3, JU 


Parathyroid hormone (PTH type b) 


Missense: His223 to Arg (1 IL) 


Short-limbed dwarfism 


31 


^-Adrenoceptor 


Missense: Trp64 to Arg (1 IL) 


Obesity. NIDDM 


32-34 


Growth-hormone-releasing hormone 


Nonsense: Glu72 to Stop (NT) 


Dwarfism 


35 


Adrenocortico tropin 


Missense: Ser74 to lle(2TM) 


Glucocorticoid deficiency 


36 


Glucagon 


Missense: Gly40 to Ser (NT) 


Diabetes, hypertension 


37.38 


Abbreviations: CT, carboxyl terminus; EL. extracellular loop; IL intracellular loop; NIDOM. non-insulin-dependent diabetes mellitus; NT. amino terminus; 
TM. transmembrane segment. 



The doning of cDNAs for well-known GPCRs led to the 
discovery of a surprising number of paralogues 5 . The 
existence of these novel receptor subtypes was unex- 
pected because the current cornucopia of pharmacologi- 
cal agents does not possess the required selectivity to 
distinguish all of them dearly, and thus an opportunity 
for drug discovery was quickly recognize cL Current 
research efforts seek to define the physiology associated 
with these novel receptor subtypes and to discover 
highly selective compounds as potential pharmaceutical 
drugs. These efforts are almost exdusively focused on 
GPCRs for which activating ligands are known. Since 
characterized GPCRs were, and continue to be, attractive 
therapeutic targets, it is most reasonable to speculate that 
many of the orphan receptors have similar potential The 
initial challenge is to determine the function of each 
orphan receptor through the identification of activating 
ligands and, once the function is clarified, link the orphan 
receptor to a specific disease and thus establish it as a 
candidate for a comprehensive drug discovery effort 

Reverse molecular pharmacology 

Until recently, research into the identification of 
GPCRs as targets for drug discovery has been conducted 
using the traditional approach illustrated in Fig. 2. For 
this strategy, the starting point is functional activity, 
which forms the basis of an assay by which a ligand is 



identified through purification from biological fluids, 
cell supernatants or tissue extracts. One example of the 
success of this strategy is -the discovery of the potent 
vasoconstricting peptide endothelin 39 . Once isolated, the 
ligand is used to characterize its cellular and tissue biol- 
ogy as well as its pathophysiological role. Subsequently, 
cDNAs encoding corresponding receptors are 'fished' 
from gene libraries using a variety of methodologies (e.g. 
receptor purification and expression doning) that often 
either directly or indirectly use the ligand as the 'hook'. 
As the nudeotide sequences for GPCRs begin to accu- 
mulate and be analysed, additional receptors can be 
doned by homology screening, by positional doning, 
and by polymerase chain reaction (PCR) methodologies 
that use oligonudeotide primers based on nudeotide 
sequences conserved within the seven transmembrane 
domains of the GPCR family. Once the doned human 
receptor cDNA is expressed in a heterologous cell sys- 
tem 40 , it is used, together with its ligand, to form the basis 
of a screen to explore chemical compound libraries for 
receptor antagonists or agonists. Lead structures identi- 
fied in the screen are refined through medicinal chem- 
istry using an iterative process. Resulting drug leads 
with appropriate m vivo pharmacology are passed on 
into the clinic for development 

Recently, this paradigm has changed radically with the 
introduction of a new reverse molecular pharmacological 
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Rg. 2. Paradiom shift from classical to reverse molecular pharmecologicel approaches for drug 
discovery. 



strategy, shown diagramatically in Fig. 2. Through both 
traditional molecular cloning techniques and, more 
recently/ mass sequencing of expressed sequence tags 
(ESTs) from cDNA libraries, it is now possible to identify 
GPCRs through computational or bioinformatic 
methodologies. The EST approach, initially proposed by 
Sidney Brenner (University of Cambridge) and first 
brought to large-scale practice by Craig Venter (The 
Institute of Genome Research), constitutes random, sin- 
gle-pass sequencing of cDNAs randomly picked from a 
collection of cDNA libraries, followed by extensive 
bioinformatic analysis of the sequence to identify struc- 
tural signatures characteristic of GPCRs. Once new 
members of the GPCR superfamily are identified, the 
recombinant^ expressed receptors are used in 
functional assays to search for the associated novel li- 
gands. The receptor-iigand pair are then used for com- 
pound bank screening to identify a lead compound that, 
together with the activating ligand, is used for biological 
and pathophysiological studies to determine the func- 
tion and potential therapeutic value of a receptor antag- 
onist (or agonist) in ameliorating a disease process. In 
addition, dues as to therapeutic potential may involve 
receptor genotyping of disease populations. Once a link 
with a disease is finally identified, an appropriate com- 
pound can be advanced for dinical study. 

The reverse molecular pharmacological strategy is a 
far more daunting challenge and risky endeavour when 
compared with the more traditional approach, since the 
starting material for a drug discovery effort Is simply an 
orphan receptor of unknown function, with no apparent 
relationship to a disease indication. However, the potential 
reward of using this approach is that resultant drugs nat- 
urally will be pioneer or innovative discoveries, and a 



significant pro portion of these unique drugs may be use- 
ful to treat diseases for which existing therapies are lack- 
ing or insuffident 

Screening strategy 

Figure 3 illustrates the generic strategy that we use 
for our reverse molecular pharmacological approach. In 
addition to the EST approach, which has yielded the 
majority of our collection of orphan receptors, we have 
also used a number of more traditional approaches such 
as low-stringency screening, using portions of known 
GPCRs as hybridization probes, as well as PCR-based 
methods. By these techniques we have succeeded in 
identifying more than 70 orphan receptors in addition to 
those already dted in the literature. 

Since cDNAs identified by EST doning are often in- 
complete, northern hybridization analysis is used to estab- 
lish the tissue or cell pattern of mRNA expression of the 
GPCRs. This information is used to identify the tissue or 
cell cDNA libraries that are to be probed for full- length 
dones and, significantly, to determine whether a receptor 
is expressed in a particular normal or diseased tissue of 
interest A highly selective tissue expression pattern may 
also provide a due with respect to receptor function. Once 
obtained, full-length GPCR dones are expressed in mam- 
malian cell lines and yeast model systems (see below) for 
functional analysis. Xenopus oocytes may also be used for 
expression; however, low screening throughput limits 
their use to a secondary, confirmatory assay system. For 
mammalian cell expression, the human embryonic kidney 
(HEK) 293 cell line or Chinese hamster ovary (CHO) cells 
are frequently used. These cell types possess a large reper- 
toire of G proteins that are necessary for coupling to 
downstream effectors in situ. They also share a reliable 
history of positive functional coupling for a wide variety 
of known GPCRs. However, since receptor coupling 
cannot be accurately predicted from primary sequence 
data, orphan GPCRs may need to be expressed in a 
variety of cell lines to establish viable coupling. 

These heterologous expression systems form the basis 
for screening for an activating ligand. The success of 
establishing functional coupling of the recombinant 
receptor depends to a large extent on whether the recep- 
tor is properly expressed, which may be assessed by 
northern or Western blot analysis, and whether appro- 
priate G proteins and downstream effectors are present 
in the cell in which the receptor is expressed. There are 
several major technical challenges to be met in order to 
initiate ligand fishing. Because it is difficult to predict 
accuratdy the coupling spedfidty of orphan GPCRs 
from their primary sequence, assays must be chosen 
that will detect a wide range of coupling mechanisms. 
These generally focus on changes in intracellular levels 
of cAMP or Ca 2+ but can also indude more generic 
measurements, such as metabolic activation of the cell 
via the cytosensor microphysiometer* 1 . Recently, it has 
become possible to implement most of these screens in 
high-throughput format by using fluorescent-based 
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assays and using charge-coupled device cameras and 
reporter gene constructs that allow easy readout of the 
assay on ndcrotitre plates . Ever increasing throughput of 
the assays will be necessary to screen large libraries. 
However, this approach is somewhat cumbersome and 
inefficient if all the assays described above have to be 
used Is it possible to funnel heterologous signal trans- 
duction through a defined pathway? The prospect of an 
.assay for a single transduction pathway comes from the 
observation that heterologous expression of the G pro- 
tein subunit G al5/16 promoted coupling of various GPCR 
subfamily members through activation of phospholipase 
C0 and likely Ca 2 * mobilization*** Although this 
approach may not work universally, the diversity of the 
GPCRs successfully coupled through G al6 to phospho- 
lipid metabolism suggests that this could be a useful 
method to screen for orphan receptor activation. 

Once heterologous receptor expression is achieved 
and functional assays are in place, ligand fishing experi- 
ments can be initiated Although the homology with 
known GPCRs is low, we nevertheless begin by screen- 
ing the orphans against known GPCR ligands; since the 
sequence homology between some subtypes of known 
receptors can be low (e.g. 30-40% between neuropeptide 
Y receptor subtypes), it is possible that new paralogue 
receptors for known ligands still remain to be discov- 
ered The next step is to search for novel activating 
ligands by screening biological extracts obtained from 
tissues, biological fluids and cell supernatants. An ad- 
ditional option is screening libraries of compounds for 
activating ligands. Complex libraries of peptides or com- 
pound collections could be rich sources of 'surrogate' 
agonists that would promote receptor activation and 
coupling but are not endogenous ligands. The rationale 
for searching for surrogate agonists springs from a report 
that a nonpeptide agonist has been discovered for the 
angiotensin n receptor 14 . There is also an obvious prec- 
edent for nonpeptide agonists for opioid receptors. 
Screening of the very large libraries that will be generated 
by fractionation of biological extracts and by combinato- 
rial chemical synthesis requires that the functional 
assays used have not only a high throughput but are also 
robust, since false positives can be a significant problem. 

Examples are beginning to emerge from several 
efforts showing that progress has been made in charac- 
terizing orphan GPCRs, A first example is the identifi- 
cation of an orphan GPCR that functions as a calcitonin 
gene-related peptide (CGRP) receptor 45 . CGRP is a pep- 
tide of 37 amino adds, widely distributed in neurones, 
and functions as a potent vasodilator Jt may be involved 
in migraine and has been implicated in non-insulin- 
dependent diabetes mellitus because it promotes resist- 
ance to insulin. An orphan GPCR EST was derived from 
a human synovium cDNA library 45 . Sequence analysis 
showed that the new GPCR has -56% similarity to the 
human calcitonin receptor and was hence originally 
expected to be a new subtype of the calcitonin receptor. 
The message for this novel receptor was expressed 
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Fig. S. Strategy for using orphan G protein-coupled receptors as targets for drug discovery. 



predominantly in lung, which is known to be a relatively 
rich source of CGRP receptors. Following full-length 
cloning from a human lung library; the orphan receptor 
cDNA was stably expressed in HEK293 cells. Both radio- 
ligand binding using ^PJCGRP, as well as functional 
assays of OSRP-stimulated cAMP accumulation, 
demonstrated an appropriate pharmacological profile 
for the expressed receptor similar to that observed with 
endogenous CGRP receptors on human neuroblastoma 
cells. In addition to identifying the CGRP receptor, the 
reverse molecular pharmacology approach has also been 
used to identify other orphan receptors, such as the 
anaphyiatoxin C3a receptor 46 . 

The examples given above are for receptors with sig- 
nificant homology to known GPCR auperfamfly mem- 
bers and their activating ligands proved to be known 
GPCR Kgands. WM ligand fishingbe successful in identi- 
fying novel endogenous ligands? Recently/ two groups 
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fig. I Yeast-based screen for the identification of agonists for orphan G protein-coupled 
receptors (GPCRs). a: Normal, endogenous GPCfl signalling in yeast [Saccharomyces 
cereyisiaA hr. Substitution of a human GPCR and a human G. subuntt for yeast counterparts 
and modification of downstream signalling pathways such that agonist stimulation of the 
recombinant GPCfl promotes growth. This yeast strain can be screened using biological 
extracts or compound libraries, or both, c Yeast cells can be engineered to secrete small 
peptides from a random peptide library to identify autocrine surrogate, peptide agonists for 
recombinant orphan GPCRs. Modified from Ref. 49. with permission. 



investigated an orphan opioid-like receptor, ORL1 (Refs 
47 and 48). Both groups expressed the orphan GPCR in 
CHO cells and challenged the transfected cells with a 
series of opiate agonists, but without response. Both 
groups then used a similar ligand fishing approach. 
Taking crude extracts from rat brain 47 or porcine brain* 8 , 
they screened against the stably transfected cell lines 
using inhibition of adenylate cyclase activity as a func- 
tional assay. They were able to fractionate the brain 
extracts and identify the novel dynorphin-like ligand, 
which they called nodceptin 47 or orphanin FQ (Ref. 48). 
Thus, both teams successfully established a functional 
assay in transfected CHO cells that allowed the purifi- 
cation of a novel neuropeptide ligand that is 17 amino 
acids long for the orphan receptor. This work validates 
the ligand fishing approach for characterizing the func- 
tion of orphan GPCRs. 

Concluding remarks and future challenges 

Although orphan GPCRs have been around for over 
ten years, very few companies have, until recently, been 
willing to risk their resources to explore opportunities 
among this category of receptors. However, the environ- 
ment for the pharmaceutical industry has changed due to 
the confluence of several major technological advances. 
The conversion of gene sequences encoding GPCRs to 
drug targets is substantially aided by the development of 
combinatorial chemistry methods and miniaturized high- 
throughput screening techniques. The future challenge 
for drug discovery in this arena is to integrate these 
technologies innovatively and productively. One glimpse 
of the future comes from the field of functional genomics. 
The endogenous GPCR transduction system of the 
yeast, Saccharomyces cerevisiae, which is the pheromone 
pathway required for conjugation and mating, has been 
commandeered —through genetic engineering - to permit 
functional expression and coupling of human GPCRs and 



humanized G protein subunits to the endogenous sig- 
nalling machinery 4 *- 51 (rag. 4). Further manipulations 
involve conversion of the normal yeast response to 
pheromone or activating ligand (growth arrest) to positive 
growth on selective media or to reporter gene expression. 
In addition, yeast cells have been engineered to express 
and secrete small peptides from a random peptide library 
that will permit the autocrine activation of heterologously 
expressed human GPCRs (Refs 49 and 51). This provides 
an elegant means of screening rapidly for surrogate pep- 
tide agonists that activate orphan receptors. This yeast 
system is, of course, not limited to autocrine ligand screen- 
ing but can also be used in high-throughput mode to 
screen directly the fractions from biological extracts and 
the various chemical libraries as described above A major 
advantage of the yeast system over the mammalian 
heterologous expression systems is its ease of use and its 
lack of endogenous GPCRs, which can confound ligand 
fishing expeditions in mammalian cells. 

There is now tremendous pressure to be the first on 
the market with highly selective drugs that target thera- 
peutic areas of unmet medical need and ideally have 
novel mechanisms of action. As a consequence, the 
pharmaceutical industry has recognized the power of 
genomics to provide it with new and unique drug tar- 
gets. Genomics has responded with a plethora of novel 
proteins, included among them over 100 orphan GPCRs. 
Because of the proven link of GPCRs to a wide variety of 
diseases and the historical success of drugs that target 
GPCRs, we believe that these orphan receptors are 
among the best targets of the genomic era to advance 
into the drug discovery process. 
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CA 1 A 2 X-competitive 
inhibitors of 
farnesyltransferase as 
anti-cancer agents 

Charles A. Omer and Nancy E. Kohl 

For Ras oncoproteins to transform mammalian cells, 
they must be post-translationally farnesylated in a 
reaction catalysed by the enzyme farnesyl-protein 
transferase (FPTase). Inhibitors of FPTase have 
therefore been proposed as anti-cancer agents. In this 
review Charles Omer and Nancy Kohl discuss the 
development of FPTase inhibitors that are kinetically 
competitive with the protein substrate in the 
famesylation reaction. These compounds are potent 
and selective inhibitors of the enzyme that block the 
tumourigenic phenotypes of ras-transformed cells and 
human tumour cells in cell culture and in animal 
models. 

Since the identification of farnesyl-protein transferase 
(FPTase) activity in mammalian cells, there has been an 
intense effort to develop inhibitors of this housekeeping 
enzyme for use as potential novel anti-cancer agents 1 * 2 . 
This idea stems from the fact that several of the proteins 
that regulate mammalian cell proliferation require a 
post-translational modification catalysed by this enzyme 
for biological activity. Efforts over the past eight years 
have yielded potent, cell-active inhibitors of FPTase 
that demonstrate antiproliferative activity in cell 
culture and in rodent models of cancer. 

The focus of the FPTase inhibitor (FIT) studies has 
been inhibition of the transforming activity of the Ras 



oncoproteins. Three ras genes, Ha-, N- and Ki-ros, encode 
four highly homologous, 21 kD proteins, Ha-, N-, Ki4A- 
and Ki4B-Ras (Ki4 A- and Ki4B-Ras are encoded by splice 
variants of the Ki-ros gene) 3 . Ras functions to regulate the 
transduction of extracellular growth-promoting signals 
from membrane-bound receptor tyrosine kinases to 
intracellular growth-regulatory pathways. Typical of the 
low-molecular-weight G proteins, Ras is active when 
bound to GTP and inactive when bound to GDP. Cycling 
from the active to the inactive form is accomplished by 
the intrinsic GTPase activity of the protein. Mutations in 
Ras that abolish the GTPase activity result in constitu- 
tively active forms of the protein. Such oncogenically 
mutated forms of Ras, particularly Ki4B-Ras, are found 
in approximately 30% of many human cancers including 
90% of pancreatic cancers and 50% of colon cancers 4 - 5 . 

Ras is synthesized as a biologically inactive, cytosolic 
protein that localizes to the inner surface of the plasma 
membrane where it acquires biological activity follow- 
ing a series of post-translational modifications (see Ref . 6 
for review). The first and obligatory step in this series is 
the transfer of a 15-carbon isoprenoid, farnesyl, from far- 
nesyl diphosphate (FPP) to the sulphur atom of the cys- . 
teine residue located four amino adds from the carboxyl 
terminus of the protein- This cysteine residue is part of 
the CA A AjX motif found in all FPTase protein substrates, 
where C is cysteine, A t and A 2 are usually aliphatic 
amino acids and X is usually serine, methionine, gluta- 
mine, alanine or cysteine. Following famesylation, 
Aj AjX is proteolytically cleaved and the now C-terminal 
farnesyicysteine is methylated. In the case of all of the 
Ras proteins except Ki4B-Ras, palmitate groups are then 
added to cysteine residues upstream of the farnesylated 
cysteine. The demonstration that famesylation is essen- 
tial for the transforming ability of the Ras oncopro- 
teins 7 - 10 has spurred the development of inhibitors of 
the enzyme that catalyses this reaction/ FPTase, as anti- 
cancer agents. 

FPTase is a ubiquitously expressed, cytosolic enzyme 
comprised of two subunits, a 45 kDa a subunit and a 
48 kDa p subunit 6 . Cross-linking studies have shown 
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Structure and Functional Analysis of G Protein- 
Coupled Receptors and Potential 
Diagnostic Ligands 

Claire M. Fraser 

The Institute for Genomic Research, Gaithersburg, Maryland 



G protein-coupled receptors are a diverse class of proteins 
that mediate signal transduction across the plasma mem- 
brane. More than 200 receptors in this extended gene family 
have been cloned, and comparison of the deduced amino- 
acid sequences indicates that these proteins have marked 
homology and share a common membrane topology con- 
sisting of seven transmembrane helices. Although there is 
considerable variability in the physiologic ligands responsi- 
ble for receptor activation, all receptors in this group interact 
with trimeric, guanine nucleotide-binding proteins to initiate 
signaling cascades in the cell cytosol. To investigate the 
structural motifs responsible for iigand binding, we have es- 
tablished a model system to express heterologously human 
G protein-coupled receptors in a mammalian cell line. This 
experimental system allows each receptor subtype to be 
studied in isolation and provides a direct means to link re- 
ceptor activation to a particular second messenger cascade. 
Furthermore, the efficacy and specificity of new pharmaceu- 
ticals can now be evaluated readily with cloned human re- 
ceptors, eliminating the need for animal tissues. We have 
used this expression system in conjunction with an experi- 
mental strategy of site-directed mutagenesis to identify 
amino-acid residues that have a functional role in Iigand 
binding. Because of the strong homology that exists within 
this family of receptor proteins, the results of this work are 
applicable to other systems and, therefore, can help to es- 
tablish a more complete understanding of ligand-receptor 
interactions. This combined molecular and biochemical ap- 
proach to the study of G protein-coupled receptors can pave 
the way for the development of isoform-specific ligands that 
may be used for radionuclide imaging and therapy. 

J Nucl Med 1995; 36(Suppl):17S-21S 



c 

V-^ell surface receptors are integral membrane proteins 
that connect external stimuli to biochemical changes 
within the cell. These proteins can be grouped into three 
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superfamilies based on their primary structures and mech- 
anisms of action: 

1. Receptors that bind growth factors. 

2. Ligand-gated ion channels, such as the nicotinic, 
gamma-aminobutyric acid (GABA) and glycine re- 
ceptors. 

3. Receptors that interface with guanine nucleotide- 
binding regulatory proteins. 

The third group, G protein-coupled receptors, is a diverse 
collection of proteins that includes distinct receptor sub- 
families activated by peptide hormones, neurotransmit- 
ters, or environmental stimuli (Table 1), 

Although G protein-coupled receptors have different 
physiologic activators, they have two unifying character- 
istics: 

1 . Each protein contains seven stretches of high hydro- 
phobicity that appear to form membrane-spanning 
segments. Therefore, all receptors in this class are 
thought to share a similar membrane topology, anal- 
ogous to the structure of bacteriorhodopsin (Fig. 1). 
This proposed topology has been confirmed for both 
rhodopsin (7) and the beta-adrenergic receptor (2) 
through the use of antipeptide antibodies directed 
against specific regions of the receptor protein. 

2. In each system, receptor stimulation causes the acti- 
vation of a trimeric G protein on the cytosolic sur- 
face of the plasmalemma (3). Interaction with a G 
protein, therefore, is the common primary step of 
each signalling cascade. In the activated state, the 
G alpha subunit dissociates from the beta-gamma 
complex. Diversification of the biochemical re- 
sponse is caused by the subsequent modulation of 
additional effector enzymes by G alpha (Fig. 2). 
These downstream elements may include: phospho- 
lipases A, C, or D; adenylate or guanylate cyclase; 
or other proteins, such as ion channels. 

Pharmacologic analysis over the past 10 to 15 yr sug- 
gested that many receptor classes were, in fact, a group 
of closely related isoforms. This premise was supported 
by the observation that a specific Iigand, such as acetyl- 
choline, could elicit distinct biochemical responses in dif- 
ferent tissues. Moreover, the sensitivity of receptors to 
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TABLE 1 

Membrane Receptors That Interact with G Proteins 



Peptide Hormone Receptors 

Angiotensin 

Adrenocorticotropin (ACTH) 

Antidiuretic hormone 

Bombesin 

Bradykinin 

Calcitonin 

Cholecystokinin (CCK) 
C5a anaphylatoxin 
Corticotropin-releasing hormone 

(CRF) 
Endothelin 
Gastrin 
Glucagon 

Glucagon-like peptide 
Gonadotropin-releasing hormone 

(GnRH) 
Growth hormone-releasing 

hormone (GRF) 
lnterieukin-8 

Kinins (bradykinin, substances P 

and K) 
Leutinizing hormone (LH) 
Melanocortin 

Melanocyte-stimutating hormone 

(MSH) 
N-formyl peptide 
Neuropeptide tyrosine (NPY) 
Neurotensin 
Opiates 
Oxytocin 

Parathyroid hormone 
Pituitary adenylate cyclase- 

activating protein 
Secretin 
Somatostatin 

Thyrotropin-releasing hormone 
(TRH) 

Vasoactive intestinal polypeptide 
(VIP) 

Vasopressin 



Glycoprotein Hormone 

Receptors 
Choriogonadotropin 
Follicle-stimulating hormone 

(FSH) 
Thyrotropin (TSH) 

Neurotransmitter Receptors 

Adenosine 

Adenosine triphosphate (ATP) ' 
Alpha-Adrenergic 
Beta-Adrenergic 
Dopamine 

Gamma-aminobutyric acid 

(GABA) 
Glutamate 
Histamine 

Muscarinic acetylcholine 

Octopamine 

Serotonin 

Tyramine 

Sensory Systems 

Vision (rhodopsins) 
Olfaction 

Other Agents 

Cannabinoids 
Immunoglobulin E (IgE) 
Mas oncogene 
Platelet-activating factor 
Prostanoids 
Thrombin 



proteins in vivo, that the answers to these questions wilj. 
require the development of new experimental systems that : V 
can ascertain the properties of each receptor subtype. 

HETEROLOGOUS EXPRESSION 

We have used an experimental system in which cloned 
G protein-coupled receptors are stably transfected and 
expressed in a mammalian cell line (5). Heterologous 
expression of receptor proteins has two major advantages: 
analysis of a single receptor subtype in isolation and study 
of drug interactions with human receptors, eliminating 
the need for animal tissues in drug screening protocols. 
Although our research has focused on the muscarinic ace- 
tylcholine receptor, the observations concerning receptor- 
ligand interaction are relevant to any one of a number 
of G protein-coupled receptors. Hence, within this gene 
superfamily, there exists some commonality of structure 
and function. 

Five distinct muscarinic receptor genes have been 
cloned and sequenced (rJ) and have been designated ml 
through m5. The ml, m3 and m5 subtypes preferen- 
tially stimulate phosphoinositide hydrolysis in response 



agonists or antagonists varied with the experimental mate- 
rial. These early observations have been confirmed with 
the cloning of over 200 genes that encode G protein- 
coupled receptors (4). Comparison of the predicted pro- 
tein sequences illustrated that most receptors are part of 
a multigene family that may include as many as six iso- 
forms (4). In addition, low-stringency screening and the 
application of new molecular cloning techniques have led 
to the identification of novel receptor subtypes that were 
not previously anticipated from pharmacologic studies. 

These findings highlight one of the most challenging 
problems in the development of useful drugs for radionu- 
clide imaging and therapy: How can pharmaceuticals be 
designed and tested that are specific for a particular recep- 
tor isoform? To address this problem adequately, it is 
essential to answer the following questions: Which second 
messenger cascade is elicited by a particular receptor sub- 
type? How is the response affected by different agonists? 
It is equally apparent, from the heterogeneity of receptor 
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FIGURE 1. Schematic illustration of cell membrane topology 
of G protein-coupled receptors with seven stretches of mem- 
brane-spanning segments with high hydrophobicity. [Reprinted 
with permission from: Lee NH, Fraser CM. Identifying the func- 
tional domains of G protein-coupled-receptors. In: Krogsgaard- 
Larsen P, Christensen S, Kofod H, eds. News leads and targets 
in dmg research: Alfred Benzon symposium no. 33. Copenha- 
gen: Munksgaard; 1992:187-199.] 
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FIGURE 2. Schematic illustration of the signal transduction mechanisms common to G protein-coupled receptors. Agonist 
binding to G protein-coupled receptors promotes receptor coupling to various heterotrimeric G proteins, which catalyze the 
exchange of bound guanosine diphosphate (GDP) for guanosine triphosphate (GTP) on the G protein alpha subunits. Binding of 
GTP to the alpha subunits results in dissociation of the G protein heterotrimer complex. Depending on the receptor and the G 
protein with which it is associated, the Ga-GTP subunit activates (+) or inhibits {-) one or more intracellular effector enzymes, 
leading to metabolic changes in the cell. Acetylcholine binds to subtypes of muscarinic acetylcholine receptors indicated as M1 
and M2. Norepinephrine and epinephrine bind to subtypes of alpha- (a) and beta- (/?) adrenergic receptors. The heterotrimeric G 
proteins are composed of alpha, beta and gamma subunits. Effector enzymes stimulated or inhibited by G protein-coupled 
receptors include: (a) adenlyate cyclase (AC), which converts adenosine triphosphate (ATP) to cyclic adenosine monophosphate 
(cAMP) and activates protein kinase A (PKA); (b) phospholipse C (PLC), which hydrolyzes inositol phospholipids to produce 
inositol phosphates (IP3) and diacy (glycerol (DG) [Inositol phosphates increase the levels of intracellular calcium and diacyglycerol 
stimulates protein kinase C (PKC) activity]; (c) phospholipase A2, which hydrolyzes membrane lipids to produce arachidonic acid; 
and (d) various ion channels, which modulate ion flow across the cell membrane. 



to agonist binding, whereas the m2 and m4 subtypes 
preferentially inhibit adenylate cyclase (6). Each recep- 
tor however, can activate more than one intracellular 
signaling pathway under the appropriate conditions. 
For example, the phosphoinositide-coupled muscarinic 
receptors have been shown to mediate an increase in 
intracellular cyclic adenosine monophosphate (cAMP) 
and also may stimulate the release of arachidonic acid 
from membranes (7,8). 

Equally interesting are the differences observed in the 
magnitude of the responses elicited by receptors that 
activate the same second messenger cascade (7). The ml 
and m3 isoforms both stimulate the phosphoinositide 
pathway. Yet, comparison of the ml and m3 subtypes, 
expressed in Chinese hamster ovary (CHO) cells, illus- 
trated that the phosphoinositide response evoked by ago- 
nist binding to the ml muscarinic receptor was always 
greater than that observed with the m3 receptor. These 
differences were not due to dissimilarities in the level of 
gene expression since both receptors were present at the 
plasma membrane in equivalent densities. It is not clear 
whether this difference reflects the coupling of these two 
receptor subtypes to distinct G proteins or a differential 
coupling to a single G protein. Nevertheless, these obser- 
vations suggest that there may be physiologically relevant 
differences in the coupling of receptor isoforms to the 
same biochemical pathway. 



We have also observed agonist-specific activation of 
intracellular signalling pathways (9). Three muscarinic 
agonists — carbachol, pilocarpine, and AF102B — were 
examined for their ability to stimulate phosphoinositide 
hydrolysis, cAMP production and arachidonic acid re- 
lease from CHO cells transfected with the ml muscarinic 
receptor. Carbachol and pilocarpine produced maximal 
stimulation of phosphoinositide hydrolysis. This response 
was greater than the phosphoinositide hydrolysis elicited 
by AF102B. Similar results were found when arachidonic 
acid release was monitored. In contrast, only carbachol 
produced an increase in the level of cytosolic cAMP, 
whereas pilocarpine and AF102B had no effect on this 
pathway. These data support findings from other studies 
with ml muscarinic receptors (10). 

Comparison of the chemical structure of these ago- 
nists suggests one plausible explanation for the diverse 
response of the ml receptor: Carbachol is the com- 
pound with the most flexibility since it can assume four 
or five conformational states that have a similar energy 
minima (9). Multiple conformational states may allow 
distinct ligand-receptor interaction, which might ac- 
count for the diversity observed in the biochemical re- 
sponse. Pilocarpine and AF102B, on the other hand, 
have more rigid chemical structures, which may limit 
the ability of these compounds to stimulate completely 
the ml receptor. Interestingly, it has been postulated 
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that the ability of AF102B to function as a partial ago- 
nist may be important therapeutically because patients 
may not develop tolerance to this compound (9). This 
drug is currently in clinical trials in Japan as a treatment 
for Alzheimer's disease. 

DOWNREGULATION 

The phenomenon of receptor downregulation and its 
relation to drug tolerance is a serious problem in the 
development of therapeutics. It has been shown that long- 
term incubation of many G protein-coupled receptors with 
an agonist produces a reduction in the number of receptor 
proteins at the cell surface (4). This process is called 
receptor downregulation. One clinical manifestation of 
this phenomenon is tachyphylaxis, observed in asthma 
patients who use beta-adrenergic agonists, such as 
bronchial dilators. Chronic administration of these beta- 
adrenergic agonists may cause patients eventually to be- 
come refractory to the agonist's beneficial effects. 

We have examined the biochemical and molecular fea- 
tures of receptor downregulation in CHO cells transfected 
with the ml muscarinic acetylcholine receptor, the a 2 - 
adrenergic receptor, and the /^-adrenergic receptor {11). 
Following 24-hr incubation with carbachol, a muscarinic 
agonist, transfected CHO cells showed a reduction in the 
magnitude of phosphoinositide hydrolysis elicited by re- 
application of carbachol in comparison to cells that had 
no prior carbachol exposure. Interestingly, the addition 
of isoproterenol, a beta-adrenergic agonist, also caused a 
reduction in the carbachol-induced phosphoinositide re- 
sponse of the muscarinic receptor. Therefore, long-term 
activation of either G protein-coupled receptor (the mus- 
carinic or the beta-adrenergic receptor) caused downregu- 
lation of the muscarinic receptor. The reduction in recep- 
tor density at the cell surface correlated with a decrease 
in the level of messenger ribonucleic acid (mRNA) spe- 
cific for the muscarinic receptor. 

These observations indicate that receptor downregula- 
tion is in part a biochemical feedback process that reduces 
the level of gene transcription in response to receptor 
stimulation. These findings may be important in the long- 
term therapy of diseases with some of these agonists and 
represent a possible utility for radionuclide imaging as a 
technique to monitor changes in receptor levels in target 
tissues. 

SITE-DIRECTED MUTAGENESIS 

Along with the biochemical analysis of G protein-cou- 
pled receptors, we have used this heterologous expression 
system to identify structures within receptor protein that 
have functional importance (4). These studies employed 
an experimental strategy of site-directed mutagenesis fol- 
lowed by expression of the mutant receptor protein in 
transfected cells to define regions responsible for ligand 
binding and receptor activation by agonists. A similar 



strategy has been utilized in other laboratories to deter, 
mine receptor domains that interact with G proteins (4\ 
and amino acid residues that undergo post-translationa] 
modifications, such as glycosylation, which may be esse* 
tial for normal receptor function (4). 

We have focused on amino acid residues that are highly 
conserved among all G protein-coupled receptors and po. 
sitioned toward the extracellular membrane surface where 
ligand-binding is thought to occur. One caveat to this 
experimental approach is the possibility that a point muta- 
tion will cause a large-scale conformational change in the 
protein. In such a case, receptor inactivity may be caused 
by protein misfolding and because the mutated residue 
had a critical role in receptor function. To minimize this ? 
problem, we have made conservative amino acid substitu- \ 
tions, replacing the original residue with one of similar i 
size and/or hydrophobicity. ! 

One of the striking features of most receptors in this \ 
family is the presence of two conserved cysteine residues j 
(4), one in the extracellular loop between helices II and \ 

III and a second in the extracellular loop between helices 

IV and V (Fig. 1). Biochemical evidence from a number ; 
of G protein-coupled receptor systems has suggested that ! 
these cysteines may form a disulfide linkage, covalently : 
connecting the two extracellular loops (4,12). We have ' 
made mutations at each position, changing the cysteine f 
to a serine residue, in the muscarinic acetylcholine recep- j 
tor. In each case, the transfected cells expressed the mu- j 
tant receptor, as evidenced by Northern analysis (13), but 
no agonist-mediated increase in phosphoinositide hydro- I 
lysis could be observed. 

These results confirm the earlier biochemical studies j 
and also suggest that disulfide formation is essential for 
maintaining the correct protein conformation required for 
recognizing ligands and receptor activation. 

The precise location of the ligand binding-site has yet 
to be determined. Earlier work implied that ligands were | 
bound within the transmembrane domains, since large i 
deletions in the beta-adrenergic receptor could be made * 
in either the extracellular or cytosolic loops without af- 
fecting ligand-receptor association (14). In light of these 1 
findings, we began to look at these domains and specific ' 
amino acids within the transmembrane helices, asking 
whether these residues had a role in ligand binding. Align- 
ment of the deduced amino acid sequences from a number < 
of G protein-coupled receptors revealed that a single 
aspartic acid residue within helix III is absolutely con- s 
served among all receptors that bind ligands with a posi- [ 
tively charged nitrogen. Examples include the following | 
proteins: muscarinic receptors that bind acetylcholine, ad- [ 
renergic receptors that bind epinephrine and norepineph- 
rine, dopamine receptors, serotonin receptors, and hista- 
mine receptors. Moreover, it has been postulated that this 
negatively charged aspartic acid may play a role in bind- 
ing the positively charged nitrogen common among these 
ligands (75). 
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We have examined the role of this aspartic acid by 
mutating it to an asparagine in beta- and alpha-adrenergic 
receptors and in the muscarinic acetylcholine receptor. All 
three mutant receptors were unable to bind radiolabeled 
ligands, whereas the wildtype proteins displayed a high- 
affinity, saturable binding of the appropriate compound 
(16). Our findings corroborate results published by Hulme 
et al. (17), which determined that this same aspartic acid 
residue in the muscarinic receptor was covalently linked 
to the radioactive affinity-probe, propylbenzilylcholine 
mustard. 

Work from our laboratory and others have also impli- 
cated transmembrane threonine, tyrosine and cysteine res- 
idues in agonist binding, although it is not yet known 
whether any of these residues directly participate in recep- 
tor-ligand interactions (78,79). All of these amino acids 
are located in the same plane of the membrane, within 
one to two turns of the alpha helix from the extracellular 
surface, supporting the idea that agonist binding may oc- 
cur within the upper third of the transmembrane helices. 

CONCLUSION 

A combined approach of heterologous gene expression 
and site-directed mutagenesis provides a starting point for 
future structure-function analysis of G protein-coupled 
receptors. These studies, along with efforts toward ob- 
taining a receptor crystal structure, may make it possible 
to design more selective ligands for radionuclide imaging 
and therapy. 
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Summary 

Bile acids repress the transcription of cytochrome 
P450 7A1 (CYP7A1), which catalyzes the rate-limiting 
step in bile acid biosynthesis. Although bile acids acti- 
vate the farnesoid X receptor (FXR), the mechanism 
underlying bile acid-mediated repression of CYP7A1 
remained unclear. We have used a potent, nonsteroi- 
dal FXR ligand to show that FXR induces expression 
of small heterodimer partner 1 (SHP-1), an atypical 
member of the nuclear receptor family that lacks a 
DNA-binding domain. SHP-1 represses expression of 
CYP7A1 by inhibiting the activity of liver receptor ho- 
molog 1 (LRH-1), an orphan nuclear receptor that is 
known to regulate CYP7A1 expression positively. This 
bile acid-activated regulatory cascade provides a 
molecular basis for the coordinate suppression of 
CYP7A1 and other genes involved in bile acid biosyn- 
thesis. 

Introduction 

Cholesterol is essential for a number of cellular func- 
tions, including membrane biogenesis and steroid hor- 
mone and bile acid biosynthesis. However, in excess, 
cholesterol can contribute to disease processes such 
as atherosclerosis and gallstone formation. Therefore, 
cholesterol biosynthesis and catabolism must be coor- 
dinately regulated. The metabolism of cholesterol to bile 
acids represents a major pathway for its elimination 
from the body, accounting for approximately half of daily 
excretion. Cytochrome P450 7A(CYP7A1) is a liver-spe- 
cific enzyme that catalyzes the first and rate-limiting 
step in one of the two pathways for bile acid biosynthesis 
(Chiang, 1998; Russell and Setchell, 1992). The gene 
encoding CYP7A1 is regulated by a variety of small, 
lipophilic molecules, including steroid and thyroid hor- 
mones, cholesterol, and bile acids. Notably, CYP7A1 
expression is stimulated by cholesterol feeding and re- 
pressed by bile acids. Thus, CYP7A1 is under both feed- 
forward and feedback regulation. 
CYP7A1 expression is regulated by several members 
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of the nuclear receptor superfamily of ligand-activated 
transcription factors (Chiang, 1998; Gustafsson, 1999; 
Russell, 1999). Recently, two nuclear receptors, the liver 
X receptor a (LXRa; NR1 H3) (Apfel et a!., 1994; Willy et 
al., 1995) and farnesoid X receptor (FXR; NR1H4) 
(Fonman et al. f 1995; Seol et al. t 1995), were implicated 
in the feedforward and feedback regulation of CYP7A1, 
respectively (Peet et al., 1 998; Russell, 1 999). Both LXRa 
and FXR are abundantly expressed in the liver and 
bind to their cognate hormone response elements as 
heterodimers with the 9-c/s retinoic acid receptor 
RXR (Mangelsdorf and Evans, 1995). LXRa is activated 
by the cholesterol derivative 24,25(S)-epoxychoIesterol 
and binds to a response element in the CYP7A1 pro- 
moter (Lehmann et al., 1997). Mice lacking LXRa do not 
induce CYP7A1 expression in response to cholesterol 
feeding (Peet et al., 1998). Moreover, these animals ac- 
cumulate massive amounts of cholesterol in their livers 
when fed a high cholesterol diet. These data establish 
LXRa as the cholesterol sensor responsible for feedfor- 
ward regulation of CYP7A1 expression. 

Bile acids stimulate the expression of genes involved 
in bile acid transport, such as the intestinal bile acid- 
binding protein (t-BABP), and repress CYP7A1 and other 
genes encoding enzymes involved in bile acid biosyn- 
thesis, such as CYP8B1, which converts chenodeoxy- 
cholic acid (CDCA) to cholic acid, and CYP27, which 
catalyzes the first step in the alternative, "acidic" path- 
way for bile acid synthesis (Russell and Setchell, 1992; 
Javitt, 1994; Russell, 1999). Recently, FXR was shown 
to be a bile acid receptor (Wang et al., 1 996; Makishima 
et al M 1999; Parks et al., 1999). Several different bile 
acids, including CDCA and its glycine and taurine conju- 
gates, bind and activate FXR at physiologic concentra- 
tions. Moreover, FXR response elements (FXREs) were 
identified in both the mouse and human l-BABP promot- 
ers (Grober et al., 1999; Makishima et al., 1999), which 
provided strong evidence that FXR mediates the posi- 
tive effects of bile acids on l-BABP expression. Notably, 
the rank order of bile acids that activate FXR correlates 
with that for repression of CYP7A1 in a hepatocyte- 
derived cell line (Makishima et al., 1999). These data 
suggested that FXR also has a role in the negative ef- 
fects of bile acids on gene expression. However, since 
the region of the CYP7A1 promoter that is necessary 
for bile acid-mediated repression lacks a strong FXR- 
binding site (Chiang and Stroup, 1994; Chiang et al., 
2000), it seemed unlikely that this repression was a di- 
rect effect of FXR. Thus, the molecular mechanism for 
bile acid-mediated repression of CYP7A1 remained 
in question. 

In this report, we have used a potent, nonsteroidal FXR 
ligand to demonstrate that FXR regulates the hepatic 
expression of small heterodimer partner 1 (SHP-1; 
NR0B2), an atypical, orphan member of the nuclear re- 
ceptor family that lacks a DNA-binding domain (Seol et 
al., 1996). SHP-1 has been shown to bind to other nu- 
clear receptors and to repress their transcriptional activ- 
ities ( Seol et al., 1996; Masuda et al., 1997; Johansson 
et al., 1999; Lee et al., 2000). We show that SHP-1 re- 
presses the CYP7A1 promoter through Interaction with 
liver receptor homolog 1 (LRH-1; NR5A2), an orphan 
nuclear receptor that binds as a monomer to a response 
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element in the CYP7A1 promoter and activates tran- 
scription (Becker-Andre et al. t 1993; Galameau et al., 
1996; Nitta et al., 1999). LRH-1 is a mammalian homolog 
of the Drosophila fushi tarazu F1 gene product, which 
regulates Drosophila metamorphosis (Lavorgna et al.. 
1991; Broadus et a!., 1999). Our findings define a novel 
regulatory cascade of three orphan nuclear receptors 
that provides a molecular basis for the coordinate re- 
pression of gene expression by bile acids. 

Results 

Identification of GW4064 as a Potent, 
Selective FXR Activator 

FXR was recently shown to be a receptor for CDCA as 
well as other bile acids (Makishima et al., 1 999; Parks et 
al., 1 999; Wang et al., 1 999). However, these compounds 
bind to FXR with only micromolar affinities and at these 
concentrations also interact with other proteins, includ- 
ing bile acid-binding proteins and transporters. We 
sought to identify a potent, selective FXR ligand for use 
as a chemical tool in elucidating the genes regulated 
by FXR. Combinatorial libraries of compounds were 
screened using a ligand-sensing fluorescence reso- 
nance energy transfer assay that detects interactions 
between FXR and a peptide derived from the steroid 
receptor coactivator 1 (SRC-1) as previously described 
(Parks et al., 1999). Among the compounds that pro- 
moted an interaction between FXR and SRC-1 was the 
isoxazole GW4064 (Figure 1 A), which bound to FXR with 
a half-maximal effective concentration (EC50) of 15 nM 
(Maloney et al., 2000). GW4064 activated mouse and 
human FXR with EC M values of 80 and 90 nM, respec- 
tively, in CV-1 cells transfected with FXR expression 
vectors and a reporter plasmid containing two copies 
of an established FXR response element (FXRE) derived 
from the Drosophila heat shock protein 27 (hsp27) pro- 
moter (Forman et al., 1995) (Figure 1B). Thus, GW4064 
is ~1 000-fold more potent than CDCA in activating FXR 
in CV-1 cells (Figure 1 B). 

GW4064 was tested for selectivity against a panel 
of nuclear receptors. CV-1 cells were transfected with 
expression plasmids for various nuclear receptor-GAL4 
chimeras and the reporter plasmid (045) s -tk-CAT as 
previously described (Paries et al., 1999). GW4064 acti- 
vated only the FXR-GAL4 chimera (Figure 1C). Thus, 
GW4064 is a highly selective activator of FXR. 

FXR Regulates SHP-1 Expression in the Liver 
GW4064 was exploited as a chemical tool to identify 
the genes regulated by FXR in the liver. Male Fisher rats 
were treated for 7 days with GW4064 or vehicle alone 
(methyl cellulose). Following treatment, RNA was pre- 
pared from the livers of GW4064- and vehicle-treated 
animals, and genes that were either induced or re- 
pressed by GW4064 treatment were determined using 
CuraGen GeneCalling™ differential gene expression 
technology (Shimkets et al., 1 999). A comprehensive list 
of the liver genes regulated by GW4064 will be published 
elsewhere. Interestingly, the gene that was most strongly 
induced by GW4064 treatment was that encoding the 
orphan nuclear receptor SHP-1 . Northern analysis showed 
that SHP-1 expression was increased ~6-fold in the 
fivers of GW4064-treated rats relative to vehicle-treated 
animals (Figure 2A). 
Bile acids are known to repress the expression of 
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Figure 1 . GW4064 Is a Potent, Selective Activator of FXR 
(A) Chemical structure of GW4064. 
' (B) CV-1 cells were transfected with expression plasmids for human 
or mouse FXR and the (hsp70EcRE) 2 -tk-LUC reporter plasmid con- 
taining two copies of the hsp70 ecdysone response element up- 
stream of the thymidine kinase (tk) promoter and luciferase gene. 
Transfected cells were treated with the indicated concentrations of 
either GW4064 or CDCA. Open circles, mouse FXR and GW4064; 
open triangles, human FXR and GW4064; closed circles, mouse FXR 
and COCA; closed triangles, human FXR and CDCA. Data points 
represent the mean of assays performed in triplicate. 
(C) CV-1 cells were transfected with expression vectors for various 
GAL4-nudear receptor ligand -binding domain chimeras and the 
reporter plasmid (C/AS) r tk-CAT. Transfected cells were treated with 
1 jlM GW4064. Data represent the mean of assays performed in 
triplicate ± S.D. 

CYP7A1 as part of a regulatory feedback loop that con- 
trols the rate of their biosynthesis from cholesterol 
(Russell and Setchetl, 1992; Russell, 1999). Two recent 
studies implicate FXR in the repression of CYP7A1 
(Makishima et al. t 1999; Wang et al., 1999), although the 
molecular mechanisms have remained unclear since the 
CYP7A1 promoter does not contain a consensus FXRE 
(Chiang et al., 2000). In parallel with our analysis of 
SHP- 1 expression, we examined whether GW4064 treat- 
ment resulted in decreased CYP7A1 expression in male 
Rsher rats. Rats treated with GW4064 showed a sub- 
stantial decrease in CYP7A1 mRNA levels (~4-fold, Fig- 
ure 2A). Thus, GW4064 mimics the well documented 
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Figure 2. FXR Ugands Induce SHP-1 and Repress CYP7A1 Ex- 
pression 

(A) Total RNA was prepared from the livers of male Fisher rats treated 
for 7 days with GW4064 or vehicle alone. Northern analysis was 
performed using probes for rat SHP-1 and CYP7A1. Data represent 
the mean (n = 3) ± standard error of the means. The asterisk denotes 
a statistically significant difference between vehicle- and GW4064- 
treated animals; P < 0.05. 

(B) Total RNA was prepared from primary rat or human hepatocytes 
treated for 46 hr with the indicated concentrations of GW4064 or 
vehicle alone. Northern analysis was performed using probes for 
rat or human .SHP-T, CYP7A1, or p-actin. 

(CJ Total RNA was prepared from primary human hepatocytes 
treated for 46 hr with the indicated concentrations of CDCA. North- 
em analysis was performed using probes for human SHP-1, 
CYP7A1. or p-actin. 



effects of naturally occurring FXR ligands, namely bile 
acids, on CYP7A1 expression. This observation pro- 
vides compelling evidence that FXR mediates feedback 
repression of CYP7A1 by bile acids. 

To substantiate the in vivo data and extend them to 
human hepatocytes, we examined whether SHP-1 and 
CYP7A1 expression were regulated by FXR in primary 
cultures of rat and human hepatocytes. Hepatocytes 
were treated with increasing concentrations of GW4064, 
and the levels of SHP-1 and CYP7A1 expression were 
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Figure 3. Identification of FXR Binding Sites in the Human, Rat, and 
Mouse SHP-1 Promoters 

(A) Alignment of the proximal regions of the human, rat. and mouse 
SHP-1 promoters. The conserved IR1 FXR binding site is boxed. 
Conserved nucleotides are indicated by asterisks. 

(B) Electrophoretic mobility-shift assays were performed with in vitro 
synthesized human FXR and/or human RXRa as indicated and re- 
labeled oligonucleotides containing the IR1 motif from the rat, 
mouse, or human SHP-1 promoters or the mouse or human f-BABP 
promoters. The positions of the shifted FXR/RXRa complex and free 
probes are indicated. 

(C) Bectrophoretic mobility-shift assays were performed with in vitro 
synthesized human FXR and/or human RXRa, a pP]4abeled oligo- 
nucleotide containing the human l-BABP FXRE, and either a 5-, 25-, 
or 75-fold excess of unlabeled oligonucleotides containing the 1R1 
motifs from the human l-BABP promoter, the mouse, rat, or human 
SHP-1 promoters, or a mutated derivative of the mouse SHP-1 IR1 
motif (mSHPmut). The position of the shifted FXR/RXRa complex 
is Indicated. 



examined by Northern blot analysis. GW4064 treatment 
markedly increased SHP-1 expression and decreased 
CYP7A1 expression in hepatocytes from both species 
in a dose-dependent fashion (Figure 2B). Similar results 
were obtained in human hepatocytes treated with the 
natural FXR ligand CDCA (Figure 2C). As expected, 
CDCA was less potent than GW4064 in rts effects on 
gene expression (compare Figures 2B and 2C). These 
data strongly suggest that FXR regulates SHP-1 and 
CYP7A1 expression In both human and rodent hepato- 
cytes. Notably, there was a striking reciprocal relation- 
ship between the regulation of SHP-1 and CYP7A1 
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Figure 4. FXR Activates the Rat and Human 
SHP-1 Promoters 

HepG2 cells were transf ected with the human 
FXR expression plasmid and kfctferase re- 
porter plasmids containing the proximal pro- 
moters of the rat flAJ, nucleotides -441 to 
+19) or human ([B], nucleotides -572 to +10) 
SHP-1 genes or the corresponding reporter 
plasmids In which the IR1 elements had been 
mutated (AIR1 ). Following transfection, cells 
were treated for 48 hr with GW4064 (1 jiM) 
or CDCA (100 jiM). Data represent the 
mean ± S.D. of six individual transf ections. 
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expression: GW4064 and CDCA repressed CYP7A1 ex- 
pression at the same concentrations that were required 
to induce SHP-1 expression (Figures 2B and 2C). Since 
SHP-1 is known to heterodimerize with several other 
members of the nuclear receptor superfamily and to 
repress their transcriptional activity ( Seol et a!., 1 996; 
Masuda et ah, 1997; Johansson et ah, 1999), these data 
raised the intriguing possibility that FXR-mediated in- 
duction of SHP-1 might underlie the repression of 
CYP7A1 expression (see below). 

FXR Binds and Activates SHP-1 Promoters 
We next sought to determine whether SHP-1 expression 
is directly regulated by FXR. FXR preferentially binds as 
a heterodimer with RXR to FXREs composed of two 
nuclear receptor half-sites of consensus AG(G/T)TCA 
organized as an inverted repeat and separated by a 
single nucleotide (IR1) (Forman et al., 1995). IR1-type 
FXREs have been identified in the human and mouse 
i-BABP promoters (Grober et al., 1999; Makishima et 
al., 1999). The mouse, rat, and human SHP-1 promoters 
were examined for IR1 motifs. A highly conserved IR1- 
like element was identified ~300 nucleotides upstream 
of the transcription initiation site In the SHP-1 promoter 
of all three species (Figure 3A). Electrophoretic mobility- 
shift analyses demonstrated that the FXR/RXR complex 
binds efficiently to the IR1 element from the SHP-1 pro- 
moter of each species (Figure 3B). In agreement with 
earlier observations (Grober et ah, 1999), the FXR/RXR 
heterodimer also bound to the mouse and human 
l-BABP FXREs (Figure 3B). Competition binding analy- 
ses showed that these interactions were specific: no 
competition was seen with a mutated derivative of the 
IR1 motif derived from the mouse SHP-1 promoter (Fig- 
ure 3C). 

The presence of an FXR/FtXR binding site suggested 
that the SHP-1 gene is directly regulated by FXR. To 
test this hypothesis, HepG2 cells were transfected with 
an FXR expression plasmid and reporter plasmids ex- 
pressing luctferase under the control of either the rat or 



human SHP-1 promoters. GW4064 treatment of cells 
transfected with the FXR expression plasmid and either 
promoter construct resulted in a marked induction of 
reporter activity (Figures 4A and 4B). Based on Northern 
blot analysis of SHP-1 expression (Figure 2B), the mag- 
nitude of the response from the rat (7 -fold) and human 
(3-fold) SHP-1 promoters was somewhat lower than ex- 
pected and it is possible that other promoter or enhancer 
elements contribute to the regulation of SHP-1 expres- 
sion. Alternately, additional factors present in well differ- 
entiated cultures of rat hepatocytes but not HepG2 cells 
may be required for maximal FXR responsiveness, in 
the absence of exogenously expressed FXR, the rat and 
human SHP-1 promoters exhibited a modest (~1 .5-fold) 
induction on exposure to GW4064, which is most likely 
due to endogenous FXR in HepG2 cells (data not shown). 
FXR responsiveness was eliminated when mutations 
were introduced into the IR1 motifs in either the rat or 
human SHP-1 promoters (Figures 4A and 4B). These 
data provide strong evidence that SHP-1 expression 
is regulated directly by the FXR/RXR heterodimer in 
multiple species. 

SHP-1 Interacts with Orphan Nuclear 
Receptor LRH-1 

The finding that SHP-1 expression is regulated by FXR 
together with the reciprocal relationship between SHP-1 
and CYP7A1 regulation (Figure 2) suggested that SHP-1 
might play a pivotal role in bile acid-mediated repression 
of CYP7A1 expression. Regulation of the CYP7A1 pro- 
moter is complex and involves numerous transcription 
factors, including nuclear receptors with known ligands 
such as the thyroid hormone receptor (TR), retinoic acid 
receptor (RAR), RXR and LXRa, and the orphan recep- 
tors COUP-TFll, HNF4a, and LRH-1 (Lehmann et ah, 
1 997; Stroup et al., 1 997; Chiang, 1 998; Peet et al., 1 998; 
Nitta et al., 1999; Russell, 1999; Stroup, and Chiang, 
2000). SHP-1 has previously been show to bind to and 
repress the transcriptional activities of TR, RAR, and 
RXR in the presence of their ligands and HNF4a in the 
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Figure 5. SHP-1 interacts with the Orphan Nuclear Receptor LRH-1 

(A) Mammalian two-hybrid experiments were performed in CV-1 
cells cotransfected with expression plasmids for the GAU-human 
SHP-1 chimera and various VP16-nuclear receptor ligand -binding 
domain chimeras. Transfection assays containing the LXRa-, FXR-, 
RARa-. TRp-, ERa-, and RXRa-GAL4 chimeras were performed in 
the absence or presence of the indicated tigands [respectively: EPC, 
24(S),25-epoxycholesterol (10 jiM), GW4064 (1 ^M); RA, all-frans 
retinoic acid (0.1 jiMJ; T 3 , triiodothyronine (0.1 jiM); E 2 , estradiol (0.1 
jiM); 9-c/s RA, 9-c/s retinoic acid (0.1 nM)]. Data are expressed as 
fold activation over cells transfected with the (UAS) s -tk-CAT reporter 
alone and represent the mean of assays (n = 8) * S.D. 

(B) GST pull-down assays were performed with PS] -labeled LRH-1 
or RXRa in the presence of GST or GST-SHP-1 as indicated. 9-c/s 
retinoic acid (9-c/s RA) was added to the binding reaction to a final 
concentration of 1 0 jiM. 



absence of any exogenous ligand (Seol et al., 1996; 
Masuda et al., 1997). Using a mammalian two-hybrid 
approach, we examined whether SHP-1 interacts with 
these and other nuclear receptors that have been impli- 
cated in the regulation of CYP7A 1 . CV-1 cells were trans- 
fected with an expression plasmid for a GAL4-SHP-1 
chimera, the (CMS) 5 -tk-CAT reporter, and expression 
plasmids for chimeras between the strong transcrip- 
tional activation domain of VP 1 6 and the isolated ligand- 
binding domains of a panel of nuclear receptors (Figure 
5A). When transfected alone, the GAL4-SHP-1 chimera 
caused a minor reduction (~0.3-fold) in reporter activity 
(Figure 5A). However, reporter activity was strongly in- 
duced when GAL4-SHP-1 was coexpressed with VP1 6- 
RXRa (~44-fold) or VP16-estrogen receptor a (ERa, 
~1 1 -fold) in the presence of 9-c/s retinoic acid and estra- 
diol, respectively (Rgure 5A). These interactions were 
strongly dependent on the presence of ligand. Uttie or 
no interaction was detected between SHP-1 and LXRa, 



FXR, COUP-TF1I, HNF4a, RARa, or TRp In our mamma- 
lian two-hybrid assay (Rgure 5A). The lack of a stronger 
interaction between SHP-1 and either TRp, RARa, or 
HNF4a was surprising in light of the previous results of 
others (Seol et al., 1996; Masuda et al., 1997) and may 
reflect differences in the assay systems used. Notably, 
strong reporter activity was detected when GAL4-SHP-1 
was expressed with VP1 6-human LRH-1 or VP1 6-mouse 
LRH-1 (~1 4-fold activation for both human and mouse). 
This activity was completely dependent on the presence 
of GAL4-SHP-1 (data not shown). These data demon- 
strate that SHP-1 can interact with LRH-1 in cells. Inter- 
estingly, little or no interaction was detected between 
SHP-1 and steroidogenic factor 1 (SF-1) (Figure 5A), a 
closely related orphan receptor that shares ~60% amino 
acid identity with LRH-1 in the ligand-binding domain 
(Tsukiyama et at;, 1992; Honda et al., 1993; Ikeda et al., 
1993). 

Using a glutathione S-transf erase -(GST) pull-down 
assay, we examined whether SHP-1 binds directly to 
LRH-1 . SHP-1 was expressed in E. coli as a fusion pro- 
- tein with GST, and ["Si-labeled LRH-1 was synthesized 
in vitro. Glutathione-Sepharose beads efficiently copre- 
cipitated [^-labeled LRH-1 in the presence of GST- 
SHP-1 but not in its absence (Rgure 5B). In parallel 
incubations, GST-SHP-1 interacted strongly with pS]- 
labeled human RXRa in the presence of 9-c/s retinoic 
acid (Figure 5B). These data are in close agreement with 
those derived from mammalian two-hybrid experiments 
(Figure 5A). Thus, SHP-1 interacts directly with LRH-1. 

SHP-1 Represses Expression of CYP7A1 
Does SHP-1 have a role in the repression of CYP7A1 
expression by FXR ligands? We addressed this question 
by performing cotransfection experiments with a rat 
CYP7A1 luciferase reporter plasmid (pGL3-rCYP7A1 
[-1573/+36]) containing nucleotides -1573 to +36 of 
the rat CYP7A1 promoter, which includes a conserved 
LRH-1 binding site (Nitta et al., 1999). In the absence of 
exogenously expressed LRH-1 , the activity of the pGL3- 
rCYP7A1 (-1 S73/+36) reporter was low when transiently 
transfected into HepG2 cells (data not shown). Cotrans- 
fection of increasing amounts of an LRH-1 expression 
plasmid resulted in a dose-dependent increase in re- 
porter activity (Rgure 6). This LRH-1 -dependent reporter 
activity was completely blocked by the cotransfection 
of SHP-1 expression plasmid (Rgure 6). These data sug- 
gest that interactions between SHP-1 and LRH-1 repre- 
sent a basis for bile acid-mediated repression of 
CYP7A1 expression. 

Discussion 

. The recent discovery that FXR is a bile acid receptor 
provided a great deal of insight into the molecular mech- 
anisms underlying bile acid signaling. In particular, these 
studies uncovered the mechanism whereby bile acids 
stimulate the transcription of genes, such as l-BABP, 
involved in bile acid transport High-affinity binding sites 
for the FXR/RXR heterodimer have been identified in 
both the human and mouse l-BABP promoters (Grober 
et at., 1999; Makishima et al., 1999). By contrast, the 
mechanism underlying bile acid-mediated repression of 
CYP7A1 expression remained a puzzle, since an FXRE 
had not been identified In the bile acid response ele- 
ments of this gene (Chiang and Stroup, 1994; Chiang et 
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Figure 6. SHP-1 Represses LRH-1 -Dependent Activation of the Rat 
CYP7A1 Promoter 

HepG2 cells were transfected with the rat CYP7A1 reporter plasmid, 
pGL3-rCYP7A1(-1573/+36). and the indicated amounts of LRH-1 
and/or SHP-1 expression plasmids. Data represent the mean of 
assays performed in triplicate ± S.D. 

al., 2000). We now present evidence that FXR does not 
repress CYP7A1 expression directly, but rather through 
induction of the gene encoding the orphan nuclear re- 
ceptor SHP-1 , which, in turn, represses CYP7A1 expres- 
sion. Similar findings have been reported by Lu et al. 
(2000 [this issue of Mot. Celf]). Consistent with this 
model, it was recently shown that SHP-1 expression is 
markedly lower and not inducible by cholic acid in the 
livers of mice lacking FXR (Sinai et a!., 2000). Taken 
together, these data provide a molecular explanation 
for the coordinate suppression of gene expression by 
bile acids. 

SHP-1 Represses CYP7A1 Expression 
We encountered the orphan nuclear receptor SHP-1 as 
part of a comprehensive, unbiased effort to identify FXR 
target genes in the liver. SHP-1 expression was strongly 
induced in the livers of rats treated with the potent, 
nonsteroidal FXR ligand GW4064. SHP-1 expression 
was also markedly induced by GW4064 in primary cul- 
tures of human and rat hepatocytes, whereas CYP7A1 
expression was suppressed under the same conditions. 
The reciprocal relationship between SHP-1 and CYP7A1 
regulation, together with the established inhibitory ef- 
fects of SHP-1 on nuclear receptor activity, suggested 
that SHP-1 might repress CYP7A1 expression. Indeed, 
expression of SHP-1 repressed the activity of the rat 
CYP7A1 promoter in HepG2 cells. 
SHP-1 is unusual in that (t lacks the highly conserved 



DNA-binding domain typically found In members of the 
nuclear receptor family. SHP-1 was originally cloned in 
yeast two-hybrid experiments using the orphan nuclear 
receptors CAR or PPARa as bait, but tt interacts with a 
number of additional nuclear receptors, including ERa 
and ERp, RAR, RXR, and TR (Seol et al., 1996; Masuda 
et al., 1997; Seol et al., 1998; Johansson et al., 1999). 
In each case, SHP-1 represses the Hgand-induced tran- 
scriptional activity of these receptors. How does SHP-1 
repress transcription of the CYP7A 1 promoter? Our data 
indicate that SHP-1 exerts much of its effect through 
interaction with the orphan nuclear receptor LRH-1. 
SHP-1 interacted strongly with LRH-1 in both a mamma- 
lian two-hybrid assay and an in vitro pull-down assay. 
Moreover, SHP-1 efficiently repressed LRH-1 -depen- 
dent activation of the rat CYP7A1 promoter: LRH-1 was 
recently shown to activate the human CYP7A1 promoter 
by binding to an extended nuclear receptor half-site 
sequence that is conserved in the mouse, rat, and ham- 
ster CYP7A1 promoters (Nitta et al. f 1 999). Earlier stud- 
ies had defined DNA response elements in the CYP7A1 
and CYP8B1 gene promoters that conferred repression 
in response to bile acids (Chiang and Stroup, 1994; 
Chiang et al., 2000; del Castillo-Olivares and Gil, 2000). 
Notably, each of these negative bile acid response ele- 
ments contains an LRH-1 binding site. Consistent with 
these data, CYP8B1 expression was repressed 3-fold 
in Fisher rats treated with GW4064 (S. A. J., unpublished 
data). Thus, interactions between SHP-1 and LRH-1 are 
likely to be important for the coordinate repression of 
a number of genes by bile acids. Among the genes that 
may be regulated by the interaction between SHP-1 and 
LRH-1 is SHP-1 itself. An LRH-1 -responsive region of 
the murine SHP-1 gene has been identified (Lee et al., 
1999). Thus, SHP-1 is likely to regulate its own expres- 
sion. This feedback regulation may provide a mecha- 
nism for attenuating the bile acid-mediated repression 
of genes by SHP-1 . A model for bile acid-mediated re- 
pression of gene expression via increased SHP-1 levels 
is shown in Figure 7. 

Two recent reports showed that SHP-1 represses the 
transcriptional activation of ERa and ERp, RXR, and the 
orphan receptor HNF4a by competing with coactivator 
binding to these receptors (Johansson et al., 1999; Lee 
et al., 2000). In addition, SHP-1 contains a strong tran- 
scriptional repressor domain in its C terminus (Lee et 
al. t 2000). Furthermore, SHP-1 has been shown to inhibit 
DNA binding of RAR-RXR heterodimers (Seol et al., 
1 996). Taken together, these studies suggest that SHP-1 
inhibits the transcriptional activity of nuclear receptors 
through multiple mechanisms. To date, we have been 
unable to demonstrate inhibition of LRH-1 binding to its 
response element in the CYP7A1 promoter by SHP-1 
(data not shown). Thus, the mechanism by which SHP-1 




Figure 7. Model for the Feedforward and 
Feedback Regulatory Effects of Bile Acids on 
Gene Expression 

Activation of FXR by bUe acids results In the 
induction of l-BABP and SHP-1 expression. 
SHP-1. in turn, interacts with LRH-1 and re- 
presses expression of CYP7A1 and CYP8B1. 
SHP-1 may also repress expression of Hs own 
gene. 
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inhibits LRH-1 -mediated transactivation of the CYP7A1 
promoter remains unresolved. 

In addition to the interactions between SHP-1 and 
LRH-1, other mechanisms may play a role in bile acid- 
mediated repression of CYP7A1 expression. First, SHP-1 
binds to and represses the transcriptional activity of 
other nuclear receptors that regulate CYP7A1, including 
RXR and TR ( Seol et al., 1996; Masuda et al., 1997). 
These interactions may also contribute to bile acid- 
mediated repression of CYP7A1 expression. Second, 
ligand-bound FXR was reported to repress LXRa activity 
on an LXRa response element (Wang et al. f 1 999), al- 
though the mechanism for this trans-repression is not 
clear. Since LXRa stimulates rodent CYP7A 1 expression 
in response to oxysterols, repression of LXRa activity 
may contribute to the overall repression of CYP7A1. 
Thus, SHP-1 /LRH-1 interactions may be one of several 
mechanisms whereby bile adds repress expression of 
CYP7A1 and other genes. 

Parallels between SHP-1/LRH-1 and Other 
Nuclear Receptor Pairs 

Intriguing parallels exist between the SHP-1 /LRH-1 in- 
teraction and another pair of nuclear receptors. LRH-1 
is most closely related to the orphan receptor SF-1, 
which regulates the expression of enzymes required for 
steroid hormone biosynthesis (Parker, 1998; Hammer 
and Ingraham, 1999). SF-1 and LRH-1 are ~85% identi- 
cal in the amino acid sequences of their DNA-binding 
domains, and both bind as monomers to the same ex- 
tended nuclear receptor half-site sequence. Notably, the 
transcriptional activity of SF-1 is repressed by binding 
to DAX-1 (dosage-sensitive sex-reversal adrenal hypo- 
plasia congenital region on the X chromosome, region 
1; NR0B1), an orphan nuclear receptor most closely 
related to SHP-1 that also lacks the DNA-binding domain 
characteristic of nuclear receptors (Zanaria et al., 1994; 
Hammer and Ingraham, 1999). Thus, both SF-1 and 
LRH-1 are negatively regulated in a frans-dominant fash- 
ion by heterodimerization with orphan receptors lacking 
DNA-binding domains. Since SHP-1 expression is stim- 
ulated by bile acids, it will be interesting to determine 
whether DAX-1 expression is also regulated by hor- 
mones. 

A second nuclear receptor pair with similarities to 
SHP-1 /LRH-1 occurs in Drosophila. Hormonal activation 
of the ecdysone receptor (EcR) during the third larval 
instar phase of Drosophila metamorphosis results in 
an increase in the expression of two orphan nuclear 
receptors, DHR3, which has a functional DNA-binding 
domain, and E75B, which does not E75B binds to DHR3 
and represses its transcriptional activity (Thummel, 
1997; White et a!., 1997). This interaction is critical for 
determining the temporal progression of metamorpho- 
sis. The EcR/E75/DHR3 and FXR/SHP-1 /LRH-1 regula- 
tory cascades are remarkably similar in that hormone- 
mediated activation of a nuclear receptor (either FXR or 
EcR) induces expression of a second nuclear receptor, 
which, in turn, binds to and represses the activity of a 
third nuclear receptor. The similarities in these genetic 
hierarchies across evolution suggest that repression via 
heterodimerization may represent an important para- 
digm for the modulation of orphan receptor activity. 

Conclusions 

The mechanism whereby FXR represses expression of 
CYP7A1 and other genes has until now remained an 



enigma, Through the use of a potent, nonsteroidal FXR 
ligand, we have Identified SHP-1 as an FXR target gene 
in the fiver of humans and rodents. Furthermore, we 
have demonstrated that SHP-1 can Interact with LRH-1 
and efficiently repress expression of CYP7A1. Thus, bile 
acid-induced repression of CYP7A1 is mediated by a 
novel regulatory cascade of three nuclear receptors. 
Since both the CYP7A1 and CYP8B1 gene promoters 
contain LRH-1 binding sites, the SHP-1/LRH-1 partner- 
ship is likely to have broad implications in bile acid 
signaling. Both SHP-1 and LRH-1 are orphan receptors, 
which raises the possibility that bile acid biosynthesis 
will be regulated by additional, unidentified hormones. 
Regardless of whether SHP-1 and LRH-1 have natural 
ligands, pharmacologic modulation of their interaction 
represents an exciting new opportunity for the discovery 
of drugs that regulate cholesterol homeostasis. 

Experimental Procedures 
Materials 

The synthesis of GW4064 will be described elsewhere (Maloney et 
al., 2000). COCA, dexamethasone, estradiol, all-fra/is retinoic acid, 
9-c/s retinoic acid, and charcoal -stripped, deliptdated calf serum 
were acquired from the Sigma Chemical Co. (St. Louis, MO). 
24(S),25-epoxycholesterol was synthesized in-nouse. DNA-modi- 
fying enzymes, polymerases, and restriction endonucleases were 
provided by Roche Molecular Biochemicals (Indianapolis, IN). Char- 
coal/dextran -treated fetal bovine serum (FBS) was purchased from 
Hyclone Laboratories Inc. (Logan, UT). The human hepatocellular 
carcinoma cell line HepG2 was obtained from the American Type 
Culture Collection (ATCC number H 8-8065, Manassas, VA). Matrigel 
was provided by Becton Dickinson Labware (Bedford, MA). All other 
tissue culture reagents were obtained from Life Technologies Inc. 
(Gaithersburg, MD). 

Animals 

Male Fisher rats were obtained from Charles River Laboratories Inc. 
(Raleigh, NC) and maintained on a 12 hr light/1 2 hr dark cycle. 
Animals were allowed food and chow ad libitum. GW4064 (30 mg/ 
kg) was administered by gavage twice a day for 7 days and the 
animals sacrificed by cervical dislocation 4 hr after the final treat- 
ment Livers were excised and snap-frozen in liquid nitrogen. Differ- 
ential gene expression analysis was performed by CuraGen Corp. 
(New Haven, CT). 

Plasmid Constructs 

Expression plasmids for the human nuclear receptor-GAL4 chime- 
ras were prepared by Inserting amplified cDNAs encoding the li- 
gand-bcnding domains Into a modified pSG5 expression vector 
(Stratagene, La Jofla, CA) containing the GAL4 DMA-binding domain 
(amino adds 1-147) and the Simian virus 40 (SV40) large T antigen 
nuclear tocalization signal (APKKKRKVG). The (U4S) 5 -tk-CAT and 
(hsp27EcRE) r tk-LUC reporter constructs have been previously de- 
scribed (Forman et al., 1 995; Parks et al. t 1 999). pp-actin-SPAP, an 
expression vector containing the human secreted placental alkaline 
phosphatase (SPAP) cONA under the control of 0-actin promoter, 
was used as an Internal control In all transfectiorts. The expression 
plasmids for human and mouse FXR (p$G5-hFXR and pSGS-mFXR, 
respectively) and human SRC-1 are described elsewhere (KJiewer 
et al., 1998; Parks et al M 1999). The full-length coding regions for 
human LRH-1 (GenBank Accession Number AB01 9246) and human 
SHP-1 (GenBank Accession Number L76571 ) were amplified by PCR 
and cloned Into pSG5, creating pSG5*LRH-1 and pSG5-hSHP-1, 
respectively. A consensus Kozak sequence was created during 
amplication. The rat (bases -441 to +19, GenBank Accession 
Number D86745) (Masuda et al., 1997} and human (bases -572 to 
+10, GenBank Accession Number AF044316) (Lee et al., 1998) 
SHP-1 promoters were amplified by PCR using the following primer 
pairs: Rat, 6'-gggtQtpogag^tc*e(X7TG<OTGGCT 
(sense) and 6' ^ ggtgtgcg agatctCCTGTTTCTTCCTGG CTCTGT 
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GGC-3' (antisense); and human, S'-oggtgtgcgagatctTCCTAGACT 
GGACAGTGGGCAAAG -3 ' (sense) and 5*-gggtgtgcgagatctCTTCC 
AGCTCTCTGGCTCTGTGTT-3' (antisense). The resultant fragments 
were Inserted Into the fig/11 site of pGL3 -Basic, a promoter-less 
luciferase reporter vector (Promega, Madison, WT). Site-directed 
mutagenesis of putative FXREs in the rat and human SHP- 7 promot- 
ers was performed using the Transformer mutagenesis system 
(CLONTECH Laboratories, Palo Alto, CA) with the AratlRI (bases 
-321 to -287. 5 ' <X7KK5T ACAGCCTGGaaTAATAtaaCTG TTTATAC-3 1 
and AhumanlRI (bases -304 to -270, 5 ' -CCTGGTACAGCCTGA 
aaTAATGtaCTTGTTTATCC-30 primers. Mutated constructs were 
verified to be free of nonspecific base changes by sequencing. 
pGL3-rCYP7A1(-1573/+36) contains bases -1573 to +36 of the rat 
CYP7A1 promoter (GenBank Accession Number Z14108) inserted 
into the Nhel site of pGL3-Basic. VP16-nuclear receptor chimeras 
contain the 80 aa Herpes virus VP16 transactivation domain (inked 
to the ligand-binding domain of the following nuclear receptors in 
a modified pSG5 expression vector human COUP-TFll, ERa, LRH-1 , 
LXRa, RARa, and TRp; mouse FXR, LRH-1, RXRa, and SF-1; and 
rat HNF4a. 

Transient Transfection Assays 

Transient transfection of CV-1 cells was performed exactly as de- 
scribed elsewhere (Jones et al., 2000). Typically, transfection mixes 
contained 2-5 ng of receptor expression vector, 20 ng of reporter 
construct and 8 ng of pp-actin-SPAP. The amount of DNA used 
in each transfection was adjusted to 80 ng with carrier plasmid 
(pBluescript, Stratagene). Mammalian two- hybrid experiments uti- 
lized transfection mixes containing 20 ng of VP1 6 nuclear receptor 
Dg and -binding domain expression vector, 5 ng of pSG5-GAL4- 
SHP-1, 15 ng of (OAS) 5 -tk-CAT. and 8 ng of pp-actin-SPAP. Cells 
were maintained for 24 hr in the presence of drug (added as a 
1000X stock in dimethyl sulfoxide) in DMEM/F-12 nutrient mixture 
containing 10% charcoal -stripped, delipidated calf serum. An ali- 
quot of medium was assayed for SPAP activity, and the cells were 
tysed prior to determination of luciferase expression. Luciferase 
activities were normalized to SPAP. HepG2 cells were maintained 
in DMEM/F-12 supplemented with 10% heat-inactivated FBS (Life 
Technologies Inc.). Piasmid DNA was transfected into HepG2 cells 
using the FuGENE6 transfection reagent according to the manufac- 
turer's instructions (Roche Molecular Biochemicals). Thus, 24 -well 
culture plates (15 mm diameter) were inoculated with 7 x 10 s cells 
24 hr prior to transfection. Cells were transfected overnight in serum - 
free DMEM/F-12 with 100 ng of reporter construct, 32 ng of pp- 
actin-SPAP, and 0-400 ng of receptor expression vectors (adjusted 
to 400 ng with carrier plasmid). Following transfection, the medium 
was aspirated and the celts were cultured for a further 48 hr in 
DMEM/F-12 supplemented with 10% heat-inactivated FBS. SPAP 
and luciferase values were determined as described above. 

Primary Culture of Human and Rat Hepatocytes 
and Northern Blot Analysis 

Primary human hepatocytes were obtained from Dr. Steve Strom 
(University of Pittsburgh). Rat hepatocytes were isolated as de- 
scribed elsewhere (LeCtuyse et al., 1096). Cells (1 .5 x 10*) were 
cultured on Matrigel -coated 6 -we (J plates In serum-free Williams' 
E medium supplemented with 100 nM dexamethasone, 100 U/ml 
penictlfin G, 100 jig/ml streptomycin, and insulin -transf e rrin - se I e- 
nium (TTS-G, Life Technologies Inc.). Twenty-four hours after isola- 
tion, hepatocytes were treated with either GW4064 (0.1-10 jj-M) or 
COCA (1*100 |iM) t which were added to the culture medium as 
1000x stocks In dimethyl sulfoxide. Control cultures received vehi- 
cle alone. Cells were cultured for a further 48 hr prior to harvest, 
and total RNA was isolated using a commercially available reagent 
(Trizol, Life Technologies Inc.) according to the manufacturer's in- 
structions. Total RNA (10 jig) was resolved on a 1 % agarose/2.2 M 
formaldehyde denaturing gel and transferred to a nylon membrane 
(Hybond N+ ( Amersham Pharmacia Biotech Inc., Piscataway, NJ). 
Blots were hybridized with C P -labeled cDNAs corresponding to hu- 
man SHP-1 (GenBank Accession Number L76571). human CYP7A1 
(bases 99-1564, GenBank Accession Number M93133), mouse 
SHP-1 (bases 30-783, GenBank Accession Number L76567), or rat 
CYP7A1 (bases 235-460, GenBank Accession Number J05460). 



Subsequently, blots were stripped and rep robed with a radiolabeled 
B-actin cDNA (CLONTECH Laboratories). 

Oectrophoretic Mobility-Shift Assays 

Bectrophoretic mobility-shift assays (EMSA) were performed essen- 
tially as described elsewhere (Lehmann et al., 1997). hFXR and 
hRXRa were synthesized from pSG5-hFXR and pSGS-hRXRa ex- 
pression vectors, respectively, using the TNTT7 Coupled Reticulo- 
cyte System (Promega). Unprogrammed tysate was prepared using 
the pSG5 expression vector (Stratagene). Binding reactions con- 
tained 10 mM HEPES (pH 7.8), 60 mM KCI, 0.2% Nonidet P-40, 
6% glycerol, 2 mM dithiothreitol (DTT). 2 ng of poly(dl-dC)*poly(dl- 
dC), and 1 fil each of synthesized hFXR or hRXRa. Control incuba- 
tions received unprogrammed tysate alone. Reactions were pre- 
incubated on ice for 1 0 min prior to the addition of pP] -labeled 
double-stranded oligonucleotide probe (0.2 pmol). Competitor oli- 
gonucleotides were added to the preincubation at 5-, 25-, and 75- 
fold molar excess. Samples were held on ice for a further 20 min, 
and the protein-DNA complexes resolved on a pre-electrophoresed 
5% potyacrytamide gel in 0.5x TBE (45 mM Tris-borate, 1 mM EDTA) 
at. room temperature. Gels were dried and auto radiographed at 
-70 a C for 1-2 hr. The following doublerstranded oligonucleotides 
were used as probes and competitors in EMSA: rSHP, 5'-gatcCCTG 
GGTTAATAACCCTGT-3'; mSHP, 5'- gatcCCTGGGTTAATGACCC 
TGT-3'; hSHP, 5'- g atcCCTGAGTTAATG ACCTTGT-3 ' ; ml-BABP, 
5 ' -gatcTTAAGGTG AATAACCTTGG -3' ; hl-BABP, 5'-gatcCCAGGT 
GAATAACCTCG G -3 ' (Grober et al., 1 999); and mSHPmut 5'-gatcCC 
TGGaaTAATGttCCTGT-3'. 

GST Pull-Down Assays 

GST-SHP-1 fusion protein was expressed in BL21(DE3)plysS cells 
and bacterial extracts prepared by one cycle of freeze-thaw of the 
cells in protein lysis buffer containing 50 mM Tris (pH 8.0), 250 
mM KCI, 1 % Triton X-1 00, 1 0 mM DTT and 1 x Complete Protease 
Inhibitor (Roche Molecular Biochemical) followed by centrifugation 
at 40,000 x g for 30 min. Glycerol was added to the resultant super- 
natant to a final concentration of 1 0%. Lysates were stored at -80°C 
until use. ^Si-labeled human LRH-1 or human RXRa was generated 
using TNTT7 Coupled Reticulocyte System (Promega) in the pres- 
ence of Pro-Mix (Amersham Pharmacia Biotech Inc.). Coprecipita- 
tion reactions included 25 nlof (ysate containing GST-SHP-1 fusion 
protein or control GST; 25 »il of incubation buffer (50 mM KCI, 40 
mM HEPES [pH 7.5], 5 mM 8-mercaptoethanol, 0.1 % Tween 20 and 
1% nonfat dry milk); and 5 ^ of pSJ-labeled LRH-1 or RXRa. The 
mixtures were, incubated for 25 min with gentle rocking at 4*C 
prior to the addition of 20 *d of glutathione- Sepharose 4B beads 
(Amersham Pharmacia Biotech Inc.) that had been extensively 
washed in protein lysis buffer. Reactions were incubated at4*C with . 
gentle rocking for a further 20 min. The beads were pelleted at 3000 
rpm in a microfuge and washed four times with: protein incubation 
buffer. Following the final wash, the beads were resuspended in 25 
of 2x SOS-PAGE sample buffer containing 50 mM DTT. Samples 
were heated to 100X1 for 5 min and resolved on a 10% acryl amide 
gel. Autoradiography was-performed overnight 

Statistical Analyses 

Unless otherwise stated, data are expressed as mean ± standard 
deviation (S.D.). The significance of differences In SHP-1 and 
CYP7A1 expression between vehicle- and GW4064 -treated animals 
were analyzed using an unpaired Student's f-test 
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The rapid proliferation andldentijication of newly cloned GPCRs 
reveal a much greater diversity within this supergene family than 
was previously considered at the pharmacological level 



Molecular Biology of 
G-Protein-Coupled Receptors 



by Norman H. Lee 

and Anthony R. Kerlavage 



The transfer of information across 
the eel] plasma membrane is a critical 
feature for the proper functioning of liv- 
ing cells. For many hormones, neuro- 
transmitters and chemotactic factors, 
signal transduction is accomplished 
through the specific interaction of these 
bioactive molecules (agonists) with 
cell-surface receptors that couple to 
guanine nucleotide-binding regulatory 
proteins (G-proteins) (for a review see 
reference 1 ). The consequence of recep- 
tor occupancy by agonist is the genera- 
tion of an intracellular second messen- 
ger signal that causes the cell to respond 
in an appropriate manner. G-protein- 
coupled receptors (GPCRs) play a key 
role in many physiologic processes, in- 
cluding nerve-to-nerve transmission, 
cardiac and smooth muscle contraction/ 
relaxation, endocrine and exocrine se- 
cretion and chemotaxis. The fact that 
GPCRs mediate a bzoad spectrum of 
cellular events make these proteins an 



ideal target for drug interaction and 
therapeutics. 

As with all members of the GPCR 
gene family, the mechanism of signal 
transduction involves receptor coupling 
to a G-protein (for reviews see refer- 
ences 2 and 3), G-proteins, are hetero- 
triraeric proteins formed of a single 
GDP-bound a-subunit, one p-subunit 
and one y-subunit In response to ag- 
onist hinging, GPCRs undergo a 
change in conformation (receptor-acti- 
vated state) that triggers the formation 
of an agonist/receptor/G-protein terna- 
ry complex. Concomitant to ternary 
complex formation is the exchange of 
GDP for GTP on the a-subunit, thereby 
freeing the a-subunit from the (^sub- 
units. Consequently, the GTP-contain- 
ing a-subunit (and in some cases the {fy- 
subunits) acts to stimulate or inhibit an 
array of effector enzymes including 
adenylyl and guanylyl cyclase, phos- 
pholipase A and C, phosphodiesterases 
and ion channels. Termination of the 
signal transduction cascade is accom- 
plished by the intrinsic GTPase activity 
found on the a-subunit Hydrolysis of 



bound GTP to GDP and inoiganic 
phosphate leads to reassociation of the 
a-subunit with the py-subunits and dis- 
sociation of the agonist/receptoi/G-pro- 
tein complex. 

The first member of the GPCR gene 
family whose sequence was elucidated 
was the visual photoreceptor rhodop- 
sin. 4 * 5 During the past ten years, the 
number of cloned receptors has steadily 
risen and now approaches 200. 6 These 
proteins are single polypeptides ranging 
in size from about 400-1000 amino 
acids. The activating ligand for GPCRs 
varies widely in character (Table I), yet 
these receptors share a highly conserved 
structure and topography. The hallmark 
feature of GPCRs is the presence of 
seven relatively hydrophobic domains, 
each 20-28 amino acids in length, that 
are presumed to span the lipid bilayer in 
an a-helical arrangement (Fig. 1). For 
the most part, it is the membrane-span- 
ning regions which exhibit the greatest 
degree of amino acid sequence identity, 
ranging from 20% to more than 50%, 
depending on which receptor proteins 
are being compared. 7 More divergent 
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TABLE I: ENDOGENOUS LIGANDS 
FOR G-PROTEIN-COUPLED 
RECEPTORS 

Biogenics mmincs/neurotransmiUcrs 

Acetylcholine 

Adenosine 

Dopamine 

Epinephrine 

Glutamate 

Histamine 

Norepinephrine 

OcUmamine 

Serotonin 

Peptidca/pepUde hormones 
Angiotensin 

Bombesin-like peptides (neuromedin B. 

gastrin-rel easing peptide 
Bradykinin 
C5a anaphylaioxto 
Calcitonin 
Endothciin 
Morrayl peptide 
Interieolrin-8 

Neuromedin K (also known as neurokinin 
B) 

Neuropeptide Y 
Neurotensin 

Parathyroid hormone/parathyroid related 

peptides 
Secretin 
Somatostatio 

Substance K (aJso known as neurokinin A) 
Substance P 

Thyrotropin-releasing hormone 
Vasoactive intestinal peptide ^ 

Glycoprotein hormones 
Follicle-stimulating hormone 
Lutropin/choriogonadooopin 
Tbyroid'Stimuiating hormone (also known 
as thyrotropin) 

Regulatory factors 
cAMP 

Cannabinoids 
Platelet*activating factor 
Thromboxane A2 
Thrombin 

Yeast-mating factors (a and alpha-phcro- 
mones) 

Miscellaneous 

Light 
Odorants 



arc the extracellular amino and intra- 
cellular carboxyl-terrainal regions, as 
well as the six hydrophilic regions that 
connect the hydrophobic domains of the 
receptor to form alternating extracellu- 
lar (el, e2 t e3) and intracellular 01,12, 
i3) loops (Fig. 1 ). This current model for 
the tertiary structure of GPCRs is based 
on analogy with bacteriorhodopsin, a 
light-activated proton pump whose 
three-dimensional structure was de- 
duced from electron microscopy. 8 - 9 The 
structure of bacteriorhodopsin is seen as 



HOOC- 



intracellular 




extracellular 



Fig. 1. Model of the structural domains of G-protein-coupled receptors. The transmembrane domains 
are depicted as cylinders perpendicular to the plane of the plasma membrane. Transmembrane do- 
mains 1-7 (TM1-TM7) are proposed to traverse (he membrane in an alpha-helical fashion and be 
connected by alternating extracellular (e l-c3) and intracellular (i 1— i3> loops. The amino- (NH2) and 
carboxyl- (COOH) terminal regions of G-protein-coupled receptors are situated at the extracellular 
and intracellular sides of the plasma membrane, respectively. 



having seven transmembrane a-helices 
connected by hydrophilic loops, with 
the transmembrane domains being ar- 
ranged in bundles perpendicular to the 
lipid bilayex In addition* both bacterio- 
rhodopsin and the GPCR rhodopsin 
contain the light-absorbing molecule 
1 1-cw-retinal. A conserved Lys residue 
found in the same relative position on 
transmembrane domain 7 (TM7) in bac- 
teriorhodopsin aaki rhodopsin serves as 
the covalent attachment point for the 
chromophobe. Although bacteriorho- 
dopsin does not belong to the family of 
GPCRs, the structural similarities be- 
tween these two classes of proteins are 
noteworthy. *" 

Based on primary sequence analy- 
sis, members of the GPCR gene family 
can be categorized into distinct subfam- 
ilies (Figs. 2 and 3). These include re- 
ceptors that bind the biogenic amines 
(e.g., epinephrine, dopamine, acetyl- 
choline), glycoprotein hormones (c.g. f 
thyrotropin, follicle-stimulating hor- 
mone, lutropin/choriogonadotropin) 
and neurokinins (e.g., substance P, sub- 



stance K, neuromedin K). The recent 
cloning of the calcitonin, parathyroid 
hormone and secretin receptors repre- 
sents the delineation of yet another sub- 
family of GPCRs. These receptors are 
more closely related to one another (up 
to 42% sequence identity) than to any of 
the other seven transmembrane-span- 
ning GPCRs (less than 12%). ,CM2 In 
many instances, a receptor within a sub- 
family can be further divided into 
subtypes, each encoded by a separate 
gene. For example, muscarinic acetyl- . 
choline receptors (mAChRs) comprise 
at least five distinct subtypes (ml, m2, 
m3, m4, and m5). 13 Similarly, discrete 
molecular subtypes of the dopamine re- 
ceptor have been described (Di, D2, I>3» 
D 4 ,D 5 )- 14 

During the past five years, consid- 
erable insights have been gained into 
the structure-function relationship of 
GPCRs through the construction of mu- 
tant receptor genes. 1 * 15 Inferences 
about receptor structure and function 
have been deduced from the phenotypes 
of the mutant proteins. In vitro mutage 
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Fig* 2. Relative homology of G-protein-coupled receptors. Sequences were aligned using CLUS- 
TAL 60 and refinements to the alignment were made manually. The dendogram was created using the 
DeSoete Tree Fit 61 and TrecTool (Mike Maciukenas. University of Illinois, unpublished). Only the 
aligned transmembrane regions were used in the distance calculations. The lengths of the lines are 
proportional to the percent difference between any two given sequences. All programs were run using 
the Genetic Data Environment (Steve Smith, Harvard University, unpublished). The considered re* 
ceptors are as follows: hrolmAChR. human ml muscarinic 62 ; hAlaAR, human alpha] .-adrenerg- 
ic* 3 ; hB2AR, human beta-adrenergic 6 *; hDlDR, human Dj dopamine 65 ; mDOR, mouse delta-opi- 
ate 66 ; hFMLPR. human jV-formyl peptide 67 ; hSKR. human substance K 68 ; hSPR, human substance 
P 69 ; hFSHR. human follicle stimulating hormone 70 ; hUl/CHKrhurhan lutropinychoriogonaco tro- 
pin 71 ; hTSHR, human thyrotropin 71 ; hOLFR, human olfactory 73 ; hOPS, human rhodopsin 3 ; pCTK 
porcine calcitonin 10 ; bSCK human secretin. 1 2 



nesis of GPCRs has been used lo 1) 
identify the amino acids critical for li- 
gand binding; 2) determine the domains 
on the receptor responsible for interact- 
ing with G-proteins; and 3) analyze the 
molecular basis of receptor desensitiza- 
tion. By using molecular modeling 
techniques in conjunction with infor- 
mation gained by mutational analysis, a 
better understanding of the roles played 
by various regions of the receptor pro- 
tein will provide the rationale for future 
drug design. 16 " 18 



Ammo-terminal domain and 
extracellular loops 

An interesting aspect concerning a 
number of GPCRs is the apparent lack 
of an amino-terminal signal peptide se- 
quence. The signal peptide has been 
demonstrated to be essential for the 
proper function of integral and secreted 
proteins, 19 suggesting that an internal 
signal sequence must exist on those 
GPCRs lacking an amino-tenninal one. 
In contrast, for GPCRs containing a 
large amino-terminal domain (more 
than 300 amino acids), such as the raeta- 



botropic glutamate receptor (mGlutR) 
and glycoprotein hormone receptors, 
the presence of a signal sequence has 
been noted on the anu no- terminus. 20 "" 22 
Indeed, the existence of an amino-ter- 
minal signal has been confirmed exper- 
imentally where the first 26 amino acids 
deduced from the cDNA sequence of 
the lutropin/choriogonadotropin recep- 
tor (LH/CG-R) are absent on the amino 
acid sequence derived from purified 
LH/CG-Rs. 23 

Within the amino-terminal domain 
of all GPCRs are two or more consensus 
sequences (Asn-X-Ser/Thr) for re- 
linked glycbsylation. For biogenic 
amine receptors, it is apparent that re- 
linked glycosylation is not crucial in li- 
gand (agonist and antagonist) binding 
or receptor/G-protein coupling. For ex- 
ample, treatment of purified P-adren- 
ergic receptors (PAR) with endoglyco- 
sidases to remove carbohydrate 
moieties has no apparent affect on the li- 
gand binding.and coupling properties of 
the reconstituted receptor. 24 - 25 Inhibi- 
tors of N-linked glycosylation (e.g., tu- 
nicamycin) are equally impotent in af- 
fecting ligand binding to newly 
synthesized receptors. 26 Similar results 
are seen with the expression of mutant 
pARs and mAChRs. 27 * 28 It is likely that 
glycosylation is essential for the subcel- 
lular distribution of some, but not nec- 
essarily all, GPCRs. In the case of 
PARs, mutant receptors lacking con- 
sensus glycosylation sites do not traffic 
correcdy to the cell surface. 27 Whether 
the trafficking defect is due to a de- 
crease in the translocation of receptors 
from internal stores to the cell surface or 
an increase in the rate of cell surface re- 
ceptor internalization remains to be re- 
solved. 

GPCRs whose endogenous ligands 
are biogenic amines lack significant 
amino-terminal domains (less than 50 
amino acids). Early studies focused at- 
tention on this region as a potential can- 
didate for the ligand binding domain. 
The role of the extracellular domains in 
ligand binding is best exemplified by 
the PAR, a prototypical biogenic amine 
receptor. When solubilized PAR is 
treated with proteolytic enzymes, the 
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hTSHR KFUHWWFVSLtALLOX/FVLLI 



FIHSWSADLVHCLLWFCATIV 
F ITSIACAJDLVHCLAWPFGAAHX 
FlVta^IASUXSnVLPFSATte 
FLVSIASADttVATin FFSLAWS 
FVISIAVSSLLVAVLVHPUKAVAX 
HCSIAVTCLKVy/tVLPMAALYO 
FIXSIACACLI IGTFSKMLYTTYL 
FLFSXACADU ICVFSMNLYTl YT 



I 1 

SFFCELMTSVUVLCVTACIETLCVIAi-DAY 
•KFWCEFWTS IDVLCVTAJ I CTLCVI A VDKT 
B I FCOIKAAVSVLCCTAJ I LSUCAI S I PUT 
KTVCCXXtALOVUCrSSIVHbC^XSLOAT 
CSFCNXVK/AFPIMCSTASIUrLCVISVCXT 
QVTCOLTIALDVLCCTS*ILHLCAIALt)JlT 
TtACOLWXJU*OYVASMA»VKKUXISFCBT 
PVVCOlijMLOTVVSNASVKNU.X ISFDBT 



TM4 

I 1 

WtCLVCTVWAlSALVSFL PI LM 
ABVXXLMVWIVSGLTSF . LPIQ 
AILALLSVWLSTVISIC . PLL 
XAI . IITCWVXSAtf ISFPPLXS 
AFX LXSVAWTUSVUSFI PVQL 
PRALISLTW. XSFLI S X P - PML 
AALMICLAWLVSFVWaPAILF 
ACMKIAAAWVLSFILWAPAILF 



ntCluRl 



I IA1AFSCXG I LVTL FVTLI FVLY 
VCPVTIACLCALATLrVLCVFVRH 



YIFKXJOAIUiATSTLPFOSAKTt ELLCKAVLS I OTtWM FT* t FTLTKKSVDBT AXLINlCIWlASCWPlKyM 

STLNLAVADFCFTSTI.PFFKVRJCA WFLCXFLTFXVOXKLFCJVFLIALIALDRC AKKVIICFWKALLLTtPVIIR 

FLVKLAFAEASHAAFIUVVKFTYA LFYCXFWIFFPIAAVFAfflYSKTAVAFDRT TXWICVXWVlALtXAFPQCTY 

FIVKI*ALAI)LCHAAFKAAFNFVYA RAFCYFOHLFPITAMFVSIYSKTAIAAMT ^^^^^f^Wnr 

YLVSI^VA»L»aVAACLPHITDS YVCCl^TYWYtyGINA*SCSITAFTI£KT AKKIII FWAFTSI YCMLEFFL 

LKCNI*SFADFCMCLYUXIXSVDS CSCCSTACFPrVFASELSVYTUTVXTt^BH AILIKLCCiaFSSUAMLPLVC 

LHCNXiAFABLCIClYLLLIASVDI CACXtJAACFFTVFASCLmTLTAXTl^H AASVHVHGWIFAFAAALFPrFC 

LHCKLA F ADFCMQfY ILL I A5VCL G POCMTAC FFTV FAS ELJfVYTLTVXTifaH ACAIHVGGWVCCF1LALLPLVC 

YTILACIFXCYVC. PfTLIAXPTT YWIXVCLSSAMCYSAXVTICTWUARXIJ. CFVI1ASIUSVQLTLWTLIIW 

YILLCCVFLCY-OmVFIAXPST TLWILCLCTAFSVCYSALLTKTVR1ARIFC QVAI CLALI SGQtX IVAAWLW 



TMS ' 

I " I 

hffXAJt AYAIASSWSrtVPUCIHAFVTL 

hp2AR A YAI ASajVfijrTVFLV I KVFVYS 

Hmb OlbAR FYALF3SlX^fTirLAVILVKYC 

ho2«AR WYVISaCldSfEAPCtlHILVYV 

hDlOR TYAISSBVXSFTIPVAIHIVTYT 

hSHTlaft cmYSTFCArYIPLLLKLVLYC 

hfBlmAChR IITFCTAMAAjrTIJVTVHCTLYW 

hoiaAChR AVTFCTAIAAFTLPVIIKTVUYW 

mDOft VTK ICVFLFAIWP I LI ITVCYG 

hFMLPR VRCHRFIICrSA*HSIVAVSYC 

hSPR VYHICVTVLIIELPLLVICYAYT 

hSKR LYHUWXALI1D*PLAVHFVAYS 

aTRKR PI YLHOFGV FXVMPKI LATVX.YG 

hLH/CCR YILTILILNWAFFI ICACYIKI 

hFSHR YVMSLLVLMVLAFW ICCCYIH I 

hTSHfl YIVFLVTLNIVAFVIVCCCHVKI 

HtClutRl LCW APVC YWGLL I KSCTYYAFK 

raS Xut R2 ASMLCSLA YMVLL I AUTTL YAFK 



TM7 

r — : 1 

DRLFVFFMVLCY AilfAFyt IXTC 
KeVYILUmiCYVHSCFOTUTC 
MVFKVVFWLCYFKtCtSPXXTP 
DAVRFWFWLCYFTWCLaV X XT? 
SKTFDVFV»CF^AN«SLaPl IYA 
TLUCAI IKWI>CYSKSHJn»VIYA 
ETU^eiCYIfLCYWrrXlWKCYA 
KTWriCYWLCYIWrriWACTA 



TM6 

i : 1 

tlci i rtcvrrbcwur r flanvv 

T^IIMCTTTICWLPFFTVHIV 

TXcxvvcHriLari^rFlAiPL 

TLCIWCHTIIXIILPrFIALPt 
TLSVIMCVFVCOfLrfF XUCI 
TLCIIMCTTILOrLFFFIVALV 
TLSAILLATrLIWJUIKVUV 

TILMLLAM itwa*xj*vmvli 

HVLVWCATWOfArlHIFVIV VAALHLCIALCYAWSLWVLYA 
VLSFVAAAFFLCVSPYQWAXjX CIAVOVTSALAFFfOCCJCrKLYV 

MMXWVCTTAICWLPFHim.L OQVYUI>WLAKSSI«YWXITC 
TKVLWLTTAICWL?YHLYFIL 0QVYLALFWKAMSS1KYWXXTC 
MLAVVVXLrAUWMRYRTLVW WWFLLFCRICIYU(*AIWVITN 

KKAILIFTDFTCMAPISFFAIS TKSKVLLVLFYPIK*CA*FFLYA 
RHAHLI FTDFLCHAPISFFAIS SXAK XLLVIFH PIHSCAJIPFLTA 
RMAVLXFTDFXCMAPX5FYALS S»SKILLVLPrPU*»CAJ»Ft,YA 

AFTWYTTCITWLAFVPIYFCSN CFAVSLSVTVALCCMFTPKMYII 
CFTWYTTCI IWLAFLPIFYVTS CV5VSLSGSWLCCLFAPKLHI I 



Fig. 3. Aligned amino acid sequences of the seven transmembrane domains (TM 1-7) and adjacent residues of G-protein-coupled receptors. Bold residues 
represent highly conserved amino acids. Shaded residues represent conserved residues within a subfamily of receptors. The considered receptors are as 
follows: hbeta 1 AR, human bctai-adrencrgic 74 ; ham aJphalbAR, haxnsteralpoajb-adrenergic 75 ; halpha2aAR, human alpha^-adrenergic 4 ' ; hSHTlaR. hu- 
man 5-HTi, 76 ; mTXHR, mouse thyrotropin releasing hormone 77 ; rmGlutRl, rat metabotropic glutamate receptor l 22 ; and rmGluiRZ rat metaboiropic 
gluxamate receptor lP References for remaining sequence dan can be found in Figure 2. 



resulting hydrophobic core retains its 
capacity to bind the antagonist 
[ l25 I)-iodocyanopindolol (LCY?)? 9 
Furthermore, the cryptic core is still able 
to activate the G-protein G s in response 
to agonists, which suggests that the hy- 
drophilic extracellular regions of the 
receptor are not crucial for receptor- 
ligand interactions. Utilization of ge- 
netic techniques has further delineated 
the role of the extracellular domains on 
biogenic amine receptors in ligand 
binding. Deletion mutagenesis of the 
foAR revealed that, for the most part, 
the amino- and carboxyl-terminal do- 
mains and el, 62 and e3 do not contrib- 
ute to the binding of 1CYP and the ago- 
nist isoproterenol. 30 * 31 In contrast, re- 



moval of any of the transmembrane do- 
mains practically abolishes ligand 
binding. It is apparent from these stu- 
dies that the binding domain of at least 
one subfamily of GPCRs (biogenic 
amine receptors) does not involve the 
extracellular hydrophilic regions, but 
actually resides in the transmembrane 
domains. The same is likely true for the 
receptors that bind small peptide hor- 
mones, but coiifiraiation awaits future 
experiments. For the glycoprotein hor- 
mone receptors, the large aminotexmi- 
nus (more than 300 amino adds) con- 
tains 14 imperfect Leu-rich repeat 
domains. 20 - 21 - 23 It is thought that the 
large glycoprotein hormones (28-38 



kDa) bind to this repeat structure before 
interacting secondarily with the mem- 
brane-spanning regions. Through the 
construction of chimeric receptors be- 
tween members of this receptor sub- 
family, the extracellular arriino-termi- 
nal domain has been established as the 
ligand binding site. 32 - 33 In fact, the ex- 
tracellulardomain of the LH/CH-R (mi- 
nus the remainder of protein) can be ex- 
pressed in transfected cells that 
consequently bind choriogonadotropin 
with high affinity. 34 

A structural feature shared by all 
GPCRs is the presence of a conserved 
Cys residue on el and another on e2. 
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These residues have been implicated in 
the formation of a disulfide bond, since 
replacement of either one of these resi- 
dues at position 106 (Cysl06) or- 184 
(Cysl84) with Val in the feAR pro- 
duces a mutant receptor with altered ag- 
onist binding properties. 30 Similarly, 
mutation of Cys98 or Cysl78 in the 
mlmAChR, and CysllO or Cysl87 in 
rhodopsin, completely abolishes ligand 
binding. 35 * 36 Peptide sequencing of the 
mlmAChR has confirmed the involve- 
ment of these conserved Cys'residues in 
disulfide bond formation 37 From these 
studies, it is believed that the disulfide 
linkage does not participate directly in 
ligand binding per se, but rather serves 
a physical role by maintaining the tertia- 
ry structure of GPCRs. 

Conserved amino acids in the 
transmembrane domains 

Comparison of the deduced amino 
acid sequences of members of the 
GPCR gene family has led to the identi- 
fication of conserved residues located in 
several transmembrane domains (Fig. 
3). Some residues appear to be globally 
conserved in the majority of GPCRs de- 
spite major structural differences in the 
endogenous ligands that bind to this 
family of receptors (e.g., catechola- 
mines, peptides, glycoprotein hor- 
mones). One hypothesis is that these 
highly conserved residues play a com- 
mon functional or structural role, for ex- 
ample, in the process of receptor activa- 
tion. Among the conserved residues 
found throughout the GPCRs are the 
Gly-Asn pair in TM1, a Leu-Ala-X-X- 
Asp-Leu motif in TM2, an almost ca- 
nonical motif of Asp-Arg-Tyr at the 
TM3/i2 junction, an invariable Trp resi- 
due in TM4, and a Pro residue flanked 
by aromatic amino acids in TM5, TM6 
and TM7. Interestingly, the secretin/ 
parathyroid honnone/calcitonin recep- 
tor subfamily and the mGluR subfamily 
are practically devoid of these con- 
served amino acids. Io contrast to the 
globally conserved residues, other ami- 
no acids (found only in a subfamily of 
GPCRs) are postulated to be involved in 
receptor class-specific functions, such 
as the binding of biogenic amine li- 
gands. These include the conserved Asp 
residue in TM3 and a Ser-X-X-Ser mo- 



tif in TM5 of the biogenic amine recep- 
tors (Fig. 3). 

As mentioned above, one of the gen- 
eral features common to a majority of 
GPCRs is the presence of a pair of Asp 
residues, one located in the Leu-Ala-X- 
X-Asp-Leu motif of TM2 and the other 
situated in the Asp- Arg-iyr motif at the 
junction of TM3 and i2 (Fig, 4) The im- 
portance of these two Asp residues in re- 
ceptor function has been well docu- 
mented for the (feAR, mlmAChR, 
<X2A-adrenergic receptor (a^AR) and 
dopamine Di receptor. 38 " 40 Expression 
of the human <X2aAR gene in Chinese 
hamster ovary cells, cells that normally 
lack endogenous adrenergic receptors, 
leads to a pertussis toxin-sensitive inhi- 
bition of adenylyl cyclase activity fol- 
lowing epinephrine exposure. 41 In per- 
tussis toxin-pretreated cells, however, 
agonist-mediated activation of <X2aAR 
leads to an increase in cAMP levels 41 
Substitution of Asp79 with asparagine 
(Asn) in TM2 of the ct^AR produces a 
mutant receptor displaying high-affm- 
ity agonist binding and relatively nor- 
mal antagonist binding properties . 3 * 
However, the ability of adrenergic ago-* 
nists to attenuate adenylyl cyclase activ- 
ity, as well as enhance cAMP levels 
in pertussis toxin-pretreated cells, is 
abolished Consistent with the inabil- 
ity of agonist to activate mutant 
[Asn79](X2AARs was the observed lack 
of guanine nucleotide-sensitive high- 
affinity agonist binding. Asp 130 at the 
TM3/i25uhction of the <X2aAR also ap- 
pears to influence receptor/G-protein 
coupling. Mutation of this residue to 
Asn eliminates high-affinity, guanine 
nucleotide-sensitive agonist binding. 
Moreover, agonist-mediated inhibition 
of adenylyl cyclase activity is markedly 
attenuated, while elevation of cAMP 
levels is abolished in pertussis toxin- 
treated cells. 

Similar Asp-to- Asn mutations in the 
corresponding positions of the fcAR 
and mlmAChR either significandy at- 
tenuate or completely eliminate the 
ability of the mutant receptors to acti- 
vate adenylyl cyclase and phosphoii- 
pase C activities, respectively 42 ^* 4 On 



the other hand, the effects of these muta- 
tions on ligand binding were nominal. 
Whereas muscarinic agonist and an- 
tagonist binding are relatively unaf- 
fected in the [ Asn7 1 ] m 1 mAChR and 
[Asn 1 22] m 1 mA ChR mutants, adren- 
ergic agonist affinity is decreased in 
[Asn79]{*2AR and slightly increased in 
[AsnBOJfoARs 42 - 44 Taken together, 
these studies suggest that the conserved 
Asp residues in TM2 and at the TM3A2 
junction are crucial for agonist-induced 
receptor activation Or receptor confor- 
mational changes/ It has previously 
been speculated that these invariant 
negatively charged residues may bind 
cations and serve as a "charge relay sys- 
tem" during receptor activation by ago- 
nists. 45 It is plausible that the movement 
of these ions is key to receptor confor- 
mational changes following agonist 
binding. In fact; Asp79 is known to be 
involved in sodium-dependent alloster- 
ic regulation of a^AR. 46 Interestingly, 
the mutant [Asn79](X2AAR was found 
to couple to inhibition of adenylyl cy- 
clase and calcium currents but not to po- 
tassium channel activation in AtT20 
mouse pituitary tumor cells, suggesting 
that (X2aARs undergo different confor- 
mations to couple to different G-pro- 
teins. 47 

There exists in many G-protein- 
coupled receptors an Asp residue si- 
tuated near the extracellular side of 
TM3 (Fig. 4) Replacement of Aspll3 
with Asn in the 0C2aAR abolishes yo- 
himbine binding and markedly de- 
creases agonist stimulation of the mu- 
tant receptor. 38 Mutation of the 
corresponding Asp residue in both 
foARs and mlmAChRs likewise af- 
fects ligand binding. 42 "^ It is unlikely 
that mutation of this residue alters nor- 
mal receptor processing and insertion 
into the* lipid bilayer, since 
[AsnlUJfcAR can be detected by ira- 
munoblotting in membrane prepara-> 
tions 48 These findings are consistent 
with the hypothesis that this Asp residue 
that is conserved among all biogenic 
amine receptors, including the ocAR, 
PAR, mAChR, dopamine receptor and 
serotonin receptor, is involved in an 
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a* Adrenergic *■•*" 
pi -Adrenergic 

Muscarinic 
Dopamine 

S-Hydroxytryptaraine 



Hajoscerai 
Human a 2 

Human £1 
Human 02 
Rat 02 

Rat ml 
Rat ra2 
Rat m3 
Rat m4 
Rat m5 

Rat D2 

Human S-HTia 
Rat 5-HTic 
Rat 5-HT2 : 



79 m no 

FIWIJUaELlLSFTVLPFSATLE VLGY.WVLCRIFCD I WAAVp 'LCCTAS I LSCLAI SID I 
FLVSLASA 3 XVATLVI PFSLANE VMCY . WYFGKTWCE IYLA1 0 rt,FCTSSIVHLCAISL 3 I 

F I MS LAS A l) ,VMGLLWPFGA?IV VWGR .WEYGSFFCE LVTrSV^JrLCVTAS I EJLCVI ALg^ 
FITSLACA 3 
FITSLACA 3 



FLLSLACAD 
FLFSLACA 0 
FLLSLACA 3 .1 
FLFSLGCA3 
YLLSLACA3 



!vHGtAW PFGAAH I LMKM^WTFGNFVCE FWTS1 3 rLCVTASIETLCVIAV 3 t 
,VMGLA WPFGASH I LHKM . WMFCNFWCE FWSI3 fLCVTASIETLCVIAV 3 t 



» I IGTFSKKLYTTYL LMGH 
#1 ICVFSMNLYTLYT VIGY 
ICVI SMNLTFTY I 1MNR 
*I IGAFSNNLYTLYI IK6Y 
IIGIFSMNLYTTYI LMGR 



LIVSLAV* > .LVATLVMPWWYLE WCE.WKFSRIHCO IFVTl 3 'MMCTASILNLCAISI 3 t 

LICSLAV7 3 *MVSVLVLPMAALYQ VLNK - WTLCQVTCD LFIAl 37LCCTSSILHLCAIAL 3 I 
FLMSLAI* 3 fLVGLLVMPLSLLAI LYOYVWPLPRYLCP VWISL 37LFSTASIHMLCAISI 3 I 
FLMSLAIACHUjCFLVMPVSMLTI LYCYRWPLPSKLCA IWIYlfeyLFSTASIHHLCAIStCfl 



, WALGTLACD LWLAL 3 fVASNASVHNLLLISF 3 t 

. WPLCPWCD LWLAX 3 rWSNASVMNLLI I SF 3 * 

. WALGKLACD LWLSI 3 fVASNASVMNLLVISF 3R 

. WPLGAWCD LWLA1 3 fWSNASVMNLLIISF 3 

WVLGSLACO LWLALp(fVASNASVMNLLVISFp 



Transmembrane Q 



Loop 



Tran sm embrane IU 



Fig. 4. Conservatioo of aspartate residues in TM2 and TM3 among members of the biogenic amine receptor subfamily. The numbering and location 
the conserved aspartate residues are depicted using a model of the human arpha^-adrenergic receptor. References for the sequence data can be foui 
in reference 1. (From: Wang, C-D.. Buck, MA and Baser, CM- Mol Pharmacol 1991.40: 169-79; reproduced with permission.) „ 



electrostatic interaction with the canon- 
ic amine moiety of their respective li- 
gands. 

Ser residues in TMS axe conserved 
as a pair (Ser-X-X-Ser motif) among 
biogenic amine receptors that bind cate- 
cholamines but not in those receptors 



whose endogenous ligand lacks a cate- 
chol moiety (eg., acetylcholine) (Fig. 
S). Sbwturt>-function analysis of the 
AR has implicated the hydroxy! si de- 
chain of Ser204 and Ser207 in hydro- 
gen bond formation with the meta- and 
/wra-hydroxyl groups of catechola- 
mines. 49 Substitution of either Ser resi- 



due with alanine (Ala) attenuates the ac- 
tivity of catecholamine agonists at the 
mutant receptors. The effects of these 
mutations on agonist activity can be 
mimicked by the interaction of meta- 
and/Mxra-hydxoxyl-substitnted analogs 
with the wild-type receptor. Hence, at 
the [Ala204]p2AR mutant, isoprotere- 
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Human PiAR 
Human p^AR 
Human a 2A AR 
Human a 2 (C-4) 
Hamster a,AR 
Rat D 2 DR 
Dros oct op 
Rat 5HT-1A 
Rat 5HT-1C 
Human ml 
Human m2 
Human opsin 



AYA IAS SWS FYVPLC IMAFVYL 
AYAIASSIVSFLVPLVIMVFVYS 
WYILSSCIGSFFAPCLIMGLVYA 
WYILSSCIGSFFAPCLIMGLVYL 
FYALFSSLGSFYI PLAVILVMYC 
AFWYSSIVS^YVPFIVTLLVYI 
GYVIYSSLGSFFIPIArMTIVYI 
GYTI YSTFGAFYI PLLLMLVLYG 
NFVLIGSFVAFFIPLTIMVITYF 
I ITFGTAMAAFYLPVTVMCTLYN 
AVTFGTAIAAFYLPVI IMTVLYW 
S FVI YMFWHFI I PL I VIFFC YG 



A . 


e 










( «v V h^ 204 ( v» ) 
)( °vi ) — 


f \ w StnoT ( ^ ) 




fl1pha j -Adrenergic Receptor j 


Beta 2 -Rdrenergic Receptor 



Fig. 5. Conservation of serine residues in TM5 among certain members of the biogenic amine recep- 
tor subfamily. Top: Alignment of the deduced amino acid sequences from TM5 of selected G-pn> 
tein-coupJed receptors. Amino acid sequences were aligned to maximize homologies within this re- 
gion. \ conserved serine residues in a Ser-X-X-Ser motif. References for the sequence data can be 
found in reference 1 . Bottom: Model comparing the ligand binding site of the alpha^-adrenergic (A) 
and beta?- adrenergic (B) receptors. The view of the receptor* is from the extracellular face of the plas- 
ma membrane. The seven alpha-helices are numbered I-VH. Locations of the conserved aspartate 
(Asp) and serine (Ser) residues implicated in ligand birwfing are indicated. Ligand binding model for 
the beta^-adrenergic receptor has been adapted from reference 48 or 51. (From: Wang. C-D. t Buck. 
MA. and Fraser. CM. Mol Pharmacol 1991. 40: 168-79; reproduced with permission.) 



nol and its meva-substituted analog dis- 
play only partial agonist activity, 
whereas the para-substituted analog ex- 
hibits no intrinsic agonist activity. Con- 
versely, isoproterenol and its para-sub- 
stituted analog show partial agonist 
activity at the [Ala2O7]0 2 AR mutant, 
but the m**a-substituted analog is de- 
void of activity. 

In a somewhat analogous manner to 
that of the [Ala204]p2AR mutant, when 
Ser204 is substituted with Ala in the 
(X2aAR, epinephrine and phenylephrine 
(meta-substituted)elicit 100% maximal 
agonist activity at the mutant receptor, 
whereas synephrine (para-substituted) 



displays only partial agonist activity. 38 
Based on these findings, it was postu- 
lated that Ser204 in the a^AR func- 
tions in a manner similar to that of 
Ser207 in the 02 AR by participating in 
hydrogen bond formation with the 
para-hydroxyl group from the catechol 
ring structure of catecholamine ago- 
nists. There exists a second Ser residue 
four amino acids upstream from Ser204 
in the <X2aAR (Fig. 5). Mutation of this 
residue at position 200 of the c*2aAR 
produces a mutant receptor that is fully 
activated by epinephrine, phenyleph- 
rine and synephrine. 38 Thus, Ser200 ap- 
pears not to directly participate in the ]i- 
gand binding process. This finding is 



not totally unexpected, since Ser204 
and Ser207 in the fcAR are located 
three positions apart in TM5, which is 
presumed to form an a-helix, compared 
with a distance of four residues apart for 
Ser200andSer204 in the a^AR. Since 
one turn of an a-helix encompasses 3.6 
amino acids, the hydroxyl group of 
Ser200 in the ct^AR would assume a 
different orientation in the helix com- 
pared with Ser204 in the foAR. Thus, it 
is possible that the mera-hydroxy I group 
of catecholamine agonists interacts 
with the sulfhydryl side-chain of Cys at 
position 201 of the <X2aAR, which is lo- 
cated in the same relative position in 
TM5 as Ser204 of the foAR (Fig. 5). 

C-terminal domain and the 
intracellular loops 

It has been widely presumed that the 
cytoplasmic loops of GPCRs form an 
interface between the receptor and G- 
protein. Several lines of evidence, in- 
volving both biochemical and genetic 
approaches, now lend support for this 
hypothesis. Findlay and Pappin 50 re- 
vealed early on that proteolytic diges- 
tion of i3 of rhodopsin abolished its in- 
teraction with the G-protein transducin, 
thus implicating this domain as the ma- 
jor constituent involved in the coupling 
process. This finding has been extended 
to the biogenic amine receptor subfami- 
ly through the use,of deletion and site- 
directed mutagenesis. When a large 
33-amino-acid deletion (residues 229- 
258), corresponding to the middle seg- 
ment of i3, is performed on the hamster 
02 AR, no detectable affect on the ability 
of the receptor to stimulate adenyly cy- 
clase was seen. 51 However, deletion of 
the amino- (222-229) and carboxyl- 
(25S-270) terminal portions of this loop 
caused marked reductions in agonist- 
dependent stimulation of adenylyl cy- 
clase. 51 These two short peptide seg- 
ments are believed to form amphipathic 
helices that interact with G s during the 
process of receptor activation. 

Several mutations made by 
O'Dowd et al. 52 indicate that other re- 
gions on the 02 AR protein, besides 
portions of i3, may contribute to re- 
ceptor coupling. Deletions in il and i2 
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produced mutant receptors with re- 
duced capacity to couple to G s and 
stimulate ad^nylyl cyclase, thereby 
presupposing a role for these two 
loops in receptor-G-protein interac- 
tions. Furthermore, mutation of a con- 
served Cys residue (position 341) in 
the cytoplasmic tail of the foAR im- 
paired die ability of isoproterenol to 
stimulate adenylyl cyclase. 53 Cys34l 
undergoes palmitoylation and the fatty 
acid moiety is proposed to insert itself 
into the lipid bilayer, thus creating an 
additional cytoplasmic loop- The ami- 
no- terminal segment of this 'fourth" 
intracellular loop is speculated to play 
a role in the coupling of the (JAR to 
G s , presumably by maintaining proper 
orientation of the . other G-protein 
binding domains. 53 

Data obtained on glycoprotein hor- 
mone receptors support the general no- 
tion of multiple intracellular regions 
participating in the coupling process. 
Site-directed mutagenesis of the thyro- 
tropin receptor provides evidence on the 
importance of il and the carboxyl-ter- 
minal portions of both i2*and i3 in signal 
transduction. 54 In contrast, deletion of 
two thirds of the carboxyl-terminal end 
of the cytoplasmic tail does not func- 
tionally impair the thyrotropin recep- 
tor. 54 It is not known with certainty 
whether the remaining amino-terminal 
portion of the tail, like in the f$2AR, 53 is 
important in receptor-G-protein cou- 
pling. 

Chimeric receptors have been con- 
structed to identify the intracellular re- 
gions important for defining selective 
receptor/G-proteins interactions. Stu- 
dies with chimeric ml/m2 or m2/m3 
mAChRs indicate that i3 is sufficient in 
determining the selective.xoupling of 
these receptor subtypes to their respec- 
tive effector enzymes. 55 * 56 Similar 
findings have been reported for chime- 
ric c^/pr and fc/cti-ARs 57 - 58 Howev-' 
~er, it is likely that multiple cytoplasmic 
domains are required for G-protein 
binding specificity. Wong et aL 59 have 
shown that substitution of a 12-amino- 
acid segment (in the amino- terminus of 
i3) of the (J]AR into the corresponding 
position of the mlmAChR is enough to 
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confer Gs, without disturbing Gp, cou- 
pling to the latter receptor. Only upon 
additional substitution of the corre- 
sponding i2 domains was G p coupling 
to the mlmAChR abolished 59 Hence, 
these data demonstrate the pivotal, al- 
though not exclusive, role of i3 in selec- 
tive effector coupling. 

Concluding remarks 

The rapid proliferation and identifi- 
cation of newly cloned GPCRs reveal a 
much greater diversity within this su- 
pergene family than was previously 
considered at the pharmacological lev- 
el. As more receptors are cloned, the use 
of site-directed mutagenesis in conjunc- 
tion with molecular modeling tech- 
niques will help better define the func- 
tional domains of these proteins. 
Ultimately, it is this knowledge which 
will form die basis for the development 
of future therapeutics. 
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The Cure: With Big Drugs Dying, Merck Didn't 
Merge 

It Found New Ones 

Some Inspired Research, Aided By a Bit of Luck, 
Saves Company's Independence 

The Path to a Novel Painkiller 
By Gardiner Harris 
Staff Reporter of The Wall Street Journal 

For 15 years, Edward Scolnick, head of Merck & 
Co.'s drug research, knew the company would be 
facing a crisis about now. For much of that time, he 
secretly feared that Merck might not survive it as an 
independent company. "I had some doubts that I 
didn't share with anybody," Dr. Scolnick says. 

Merck's problem, which at times has infected almost 
every big pharmaceuticals company, was that patents 
on several of its best-selling drugs would be expiring. 
Generic knockoffs would then eat deeply into market 
share and profits on drugs like Vasotec and Prinivil 
for hypertension, Mevacor for high cholesterol and 
Pepcid and Prilosec for ulcers. 

Ever since investors caught on to this, Wall Street 
has been insisting that Merck join the merger rush 
sweeping the phamaceuticals industry. But its chief 
executive, Raymond V. Gilmartin, steadfastly 
refused, insisting that Merck could grow briskly all 
by itself. 

He was right. Today, well into what was supposed 
to be the crunch, Merck is riding high. It topped all 
its peers in revenue growth last year, and most of 
them in earnings growth, analysts say. Its stock 
surged 26% in 2000 while the broad market was 
skidding. Instead of facing an acute need to save 
money, Merck is increasing its research spending by 
nearly 17% and its sales force by almost a third. 

"The safe thing would have been to seek a merger, 
emphasize generics, stay diversified and cut costs 
across the board," Mr. Gilmartin says. "We went 



against the conventional wisdom at the time, stayed 
with it and did it." 

Mr. Gilmartin, 59 years old, who arrived at Merck 
after heading a medical- device company, gambled 
that the pharmaceuticals giant's tradition of creativity 
and innovation in drug discovery would bail it out. A 
merger, by contrast, would dilute the power of this 
science-based culture - one that has been a model for 
other drug companies - and be a distraction for 
years. 

Had he and his lieutenants been wrong, Merck's 
name might have wound up in the same graveyard as 
Warner-Lambert, Upjohn, Syntex, Sandoz, Ciba- 
Geigy, Rhone-Poulenc and Hoechst, all of which had 
to resort to mergers when their labs couldn't produce 
enough new drugs to replace old ones with expired 
patents. 

Merck's success demonstrates that in the drug 
business, as in Hollywood, one big hit can sway the 
fate of an entire company. And searching for 
blockbuster drugs is a matter of inspiration, scientific 
instincts and shrewd management - assets that are 
hard to buy in a merger. 

In this case, the inspiration came from Peppi Prasit, 
a Thai-born medicinal chemist for Merck in 
Montreal. In July 1992, he found himself wandering 
around an obscure medical conference in that city. 
Chatting with a colleague, he learned that Merck 
researchers had developed a lab test to determine if a 
painkiller was less likely to cause the stomach upset 
that goes along with most pain and arthritis drugs. 

Moments later, Dr. Prasit, now 45, noticed a poster 
display from some researchers for a Japanese 
company claiming they had produced just such a 
nonirritating painkiller, though one that wasn't 
chemically fit to try in people.. Dr. Prasit immediately 
went back to his Montreal lab, cooked up the 
mysterious molecule and put it to Merck's new test. 
When it passed, he set about trying to create a similar 
drug for humans. 

His work caught the eye of Dr. Scolnick, the 
research chief at Merck's sprawling laboratory 
northwest of Philadelphia. Dr. Scolnick, 60, is a 
former molecular biologist for the National Institutes 
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of Health who joined Merck in 1982 and became its 
top scientist three years later. A dourand fierce man, 
he works out of a small office with a view of 
industrial pipes and a door so narrow he has to slide 
sideways to get through. His adjoining conference 
room is decorated only with two bedraggled plastic 
plants. 

The unimpressive surroundings belie the critical 
nature of his job: Dr. Scolnick monitors hundreds of 
intriguing scientific leads floating around Merck's 
labs and decides where the company will make its big 
bets. Merck's winnowing process has evolved over 
three decades into committees of scientists who 
discuss one another's work with brutal frankness. It's 
a system of peer review modeled on one used at the 
NIH, and it "allows a really good debate about what 
we should be doing," says Dr. Scolnick. 

The system has helped Merck, in recent years, to 
bring out new drugs like Fosomax, to slow bone 
deterioration in osteoporosis; Singulair for asthma; 
and big-selling medicines for high blood pressure, 
glaucoma, migraine and AIDS. But Dr. Scolnick says 
he didn't need a committee to tell him that Dr. Prasit's 
painkiller project had the potential to be a 
blockbuster, and a critical bridge out of Merck's 
patent problem. 



The class of painkillers called nonsteroidal anti- 
inflammatory drugs — like aspirin and ibuprofen -- 
hadn't seen a major improvement in years. And 
thousands of Americans suffered ulcers each year 
because of the drugs' side effects. Preventing that 
would clearly be a huge advance. 

These drugs attack the inflammation that leads to 
pain by curbing production of prostaglandins, 
compounds that marshal the body's defenses. But 
prostaglandins also are involved in making the lining 
that protects the gut from digestive acids. The more a 
painkiller inhibited inflammation, the more it thinned 
the protective lining, increasing the risk of bleeding 
ulcers. 



Philip Needleman, a pharmacologist at Washington 
University in St. Louis, had mapped out a potential 
way around this. It would be a drug that inhibited 
Cox- 2, an enzyme that regulates prostaglandin 
production in most of the body, but not Cox-1, a 
similar enzyme involved only in the gut. Dr. Prasit 



knew of this research and was determined to develop 
just such a drug, especially since Merck had a test for 
it. 



But in this quest, Dr. Prasit and Dr. Scolnick feared 
they were in a race — and running second. Rumors 
swirled that Dr. Needleman, who subsequently 
crossed town to join Monsanto Co., was working on a • 
similar drug for that company. 

So Dr. Scolnick ordered researchers in Montreal to 
pursue Dr. Prasit's work as fast as they could. "I 
would call up every other day and say, 'Hey, is 
everybody working on this project? 1 " recalls Dr. 
Scolnick with a rare smile. "They would always say, 
'Yes!' You don't know if they're telling the truth, but 
they got the message that it was important." 



Dr. Prasit's team synthesized hundreds of 
compounds, some of which worked great in the test 
tube but passed through laboratory mice with no 
clinical effect. Others mysteriously killed the mice. 
But by October 1 994, the team had come up with two 
compounds that aced the test-tube tests and didn't 
hurt the mice, even at extremely high doses. 

Normally, Dr. Scolnick would have chosen one of 
these to put through the expensive and risky process 
of testing in humans. But the project was so 
important — and Merck appeared to be in such a 
high-stakes competition with Monsanto — that he 
decided to put both compounds in clinical trials. 

It was a good move, because only one of the two 
ended up working. "One failed and the other didn't, 
and there was no way you could have looked at the 
preclinical data and predicted which one would 
succeed," Dr. Scolnick says. "That's just dumb luck." 

Meanwhile, Mr. Gilmartin, arriving in 1994, had 
other headaches. Growth at the company, so stellar in 
the late 1980s, had slowed. A health plan proposed 
by the Clinton administration threatened price 
controls. Powerful managed-care organizations were 
demanding deep discounts. "There were people that 
were questioning, in a managed-care environment, 
what was going to be the value of breakthrough 
research," the CEO says. "Merck, in fact, had even 
moved into the generics business. Everything was 
being questioned and challenged." 



Copr. © West 2001 No Claim to Orig. U.S. Govt. Works 



1/10/01 WSJ Al 

1/10/01 Wall St. J. Al 2001 WL-WSJ 2850599 



Page 3 



Among his first moves was to squelch the push into 
generics and sell off specialty-chemicals and 
agricultural subsidiaries. He also ordered his 
managers to make peace with managed-care 
companies. 

Merck had been fighting their demands for 
discounts, with the result that its products were 
increasingly being excluded from the "formulary" 
lists of large buying groups. "Merck was just in your 
face. If you tried to set up a meeting with them, they 
would refuse," says Lynn Detlor, president of 
American Healthcare Systems' Purchasing Partners 
LP, a huge group-purchasing organization. In Mr. 
GilmarmVs first week in 1994, he set up a meeting 
with that group's executives and promised that Merck 
would cooperate at every level, say two participants 
at the meeting. Within 18 months, the managed-care 
group had increased its purchases of Merck products 
tenfold on an annualized basis, Mr. Detlor says. 

But most important, Mr. Gilmartin decided then to 
bet the company's future on the productivity of its 
labs. "Shortly after he came here, he came to one of 
our research meetings and stayed for dinner," Dr. 
Scolnick recalls. "As he was leaving, on the way out 
he said he wanted to talk to me. And he said, 'I want 
you to know that I have complete confidence in you. 
Just do your thing and I'm not going to bother you."' 

That meant he had freedom to throw all the 
resources he wanted into Dr. Prasit's project in 
Montreal. 



In January 1995, Merck handed a batch of a 
potential new pain drug to Donald R. Mehlisch, an 
oral surgeon in Austin, Texas, who tests such 
medicines for manufacturers. He recruits students 
from the University of Texas, yanks out their wisdom 
teeth, gives them a pill and puts them into a dorm 
attached to his clinic to watch them suffer. "We 
create a lot of pain in what we do," he says 
cheerfully. 

A test drug's effectiveness is measured largely by 
how long it takes patients to insist that what they 
were given isn't working and they need something 
else. The tests are designed to be "double-blind," with 
neither doctor nor patient knowing which pill is 



which. Even so, Dr. Mehlisch sensed that what he 
was testing for Merck had potential. It was "the first 
time we've ever had a compound that has worked so 
well for so long," he says. 

Meanwhile, Monsanto, which was working on a 
similar drug just as Merck had suspected, ran its 
candidate through similar dental-pain tests. It failed 
them. However, it and Merck's drug were both good 
at relieving the longer-term pain of arthritis, without 
stomach irritation. 



To Merck's dismay, Monsanto completed its clinical 
studies first. Among the reasons was a dosage glitch 
at Merck. The company figured out only belatedly 
that, instead of as much as 1,000 milligrams, the 
proper dose was 12.5 mg. to 25 mg. The pills that 
resulted were so tiny that Merck was afraid arthritis 
patients wouldn't be able to pick them up. It enlarged 
them with edible filler, but that caused another 
problem — the filler turned out to slow the drug's 
absorption. Three months were lost while researchers 
worked to fix all this. 



On the last day of 1998, the Food and Drug 
Administration gave Monsanto approval to market its 
nonirritating painkiller, called Celebrex. In February 
1999, Monsanto began co-marketing it with Pfizer 
Inc. — and it quickly became the most successful 
drug launch in U.S. history. Merck still didn't even 
have marketing approval. 

Normally, a head start like that makes the first drug 
dominant and very hard to catch. Yet the way Merck 
handled its later launch would soon put its drug, 
called Vioxx, hot on Celebrex's trail. One reason: an 
expanded role for marketers within Merck. 

For decades, Merck's marketers hadn't been allowed 
anywhere near scientific- planning meetings. Mr. 
Gilmartin's predecessor, P. Roy Vagelos, started to 
change this, persuading scientists to accept marketers 
in their midst by promising that they wouldn't speak. 
Then, speaking was allowed but not encouraged. 
Under Mr. Gilmartin, however, the marketers have 
become deeply involved in many of the scientists' 
development decisions, though they still have no 
involvement in early-stage research issues. 

Mr. Gilmartin created teams of marketing, 
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manufacturing and research people that now plan far 
ahead. "We deconstructed every task to see where we 
could cut out steps," says Wendy Dixon, a marketing 
vice president who oversaw the Vioxx launch. "We 
carved four or five weeks off the normal product- 
launch process." 

While the team made thousands of bottles and boxes 
in advance, they couldn't do that with the pills' 
instruction flier; the FDA doesn't bless it until the day 
of approval. With the rival drug already a hit, Merck's 
challenge would be to get the approved copy from the 
FDA to print shops across the U.S. and Puerto Rico, 
print the fliers by the thousands, insert them with the 
pills and get the bottles to pharmacies in just days. 
Planes were placed on standby in case printing plates 
needed to be rushed to print shops elsewhere. 



Insiders knew that May 20, a Friday, was likely to 
be the day of FDA approval. Cheryl Ramsey- 
Weldon, the company's top formatter of instruction 
fliers, waited all day on tenterhooks. When her shift 
ended she went home and waited some more. "I was 
sitting by the phone," she says. "I called [her 
supervisor] three times to see how close we were." 



She finally got the call at 10 p.m. Ready for bed 
with her contacts out, her hair up and her pajamas on, 
Ms. Ramsey- Weldon jumped into her car as she was 
and raced the 3 1/2 miles to the plant. Four hours 
later, she had formatted the document and passed it 
along to a pair of proofreaders. At 2:30 a.m., she 
went home for a few hours' sleep. By 6 a.m. she was 
back for more. 



Merck's presses ran for days without stop. Then the 
fliers were folded and inserted. The bottles reached 
distribution centers on Monday afternoon. Vioxx was 
stocked in 40,000 pharmacies within 1 1 days of 
approval, a remarkable feat. 

Within three months of its launch, the Merck drug 
gained nearly a third of the brand-new market for 
"Cox-2 inhibitors," according to research firm IMS 
Health, and within a year it had nearly half. In 
Europe, Vioxx is dominant, having beaten Celebrex 
to market in most countries despite filing later. 
Helping Vioxx in the heated two-way competition: It 
acts more quickly than Celebrex and is more 
selective for the Cox-2 enzyme, according to 
independent studies. 

Both drugs have flaws, though. And now Merck is 
locked in another Cox-2 contest, racing Pharmacia 
Inc. — which took over Monsanto ~ to bring out 
second-generation, improved versions of the hot- 
selling drugs. 



These days, Mr. Gilmartin has an uncharacteristic 
swagger. "We're going to another level at a time 
when most worried that Merck wouldn't even 
compete," he says. But Dr. Scolnick gives plenty of 
credit to the way things broke Merck's way during 
Vioxx's development. "If those first two compounds 
had failed [in human trials] and we had had by 
chance to rely on the fifth or sixth one" years later, he 
says, "we would be a very different company." 



Prescription for Success 
Merck is gradually losing its exclusive rights to these drugs . . . 

1999 sales 1999 sales 

Drug U.S. World-wide 

(condition) (in millions) (in millions) Expiration* 

Vasotec (Hypertension) $975 $2,300 August 2000 

Mevacor (Cholesterol) $480 $600 December 2001 

Pepcid (Ulcers) $820 $910 April 2001 

Prinivil (Hypertension) $715 $815 June 2002 

But the company has drugs in the pipeline 
Launches expected in 2001 

-- Cancidas: intravenous anti-fungal drug; application submitted to 
FDA July 2000 

-- Invanz: intravenous antibiotic; application submitted to FDA 
November 2000 

-- Eterocoxib: Super Vioxx for arthritis; application to FDA expected 
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early 2001 

*End of exclusivity 

NOTE: In November, the patent will also expire on Prilosec, an 
Astra-Zeneca drug for heartburn and gastro-esophageal reflux disease, 
from which Merck receives considerable revenue. 
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